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The  purpose  of  ihe  project  eniiiled  “Query  Planning  and  Optimization  in  Object-  oriented  Knowledge  Bases’’ 
sponsored  by  AFOSR.-  ^0-0  OO  V'  'S  extend  a  deductive  object  base 

with  knowledge-based  problem  solving  and  planning,  which  is  intended  to  realize  the  concept  of  very-high  level 
programming  in  a  database  system.  The  input  to  such  a  system  is  a  specification  of  the  problem  to  be  solved  (as  a 
set  of  goals)  and  the  output  is  a  solution  of  the  problem,  where  the  knowledge-based  problem  solving  system  deals 
with  problems  that  do  not  change  the  state  of  a  database  and  the  planning  system  processes  goals  that  require  some 
state  changes  in  the  database. 

In  our  approach,  the  knowledge-based  problem  solving  system  stores  a  set  of  problem  models  (such  as  graph  prob¬ 
lems)  so  that  an  input  problem  can  be  matched  through  an  object-oriented  specialization/generalization  process.  If 
no  problem  models  can  be  matched  by  a  given  problem,  the  user  should  be  provided  with  a  high-level  program¬ 
ming  system  that  allows  a  top-down  problem  solving  process  be  carried  out  until  some  matches  can  be  found  at 
detailed  implementation  stages.  For  the  planning  system,  we  have  realized  that  most  of  the  convenuonai 
approaches  based  on  the  formulation  of  operations-preconditions-postconditions  have  been  proved  to  be  inefficient. 
We  have  classified  general  planning  problems  into  several  classes  so  that  each  class  can  be  solved  individually  and 
efficiently.  With  this  approach,  each  class  of  planning  problems  can  be  constructed  as  a  problem  model  and 
included  in  the  general  problem  solving  system. 
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nNAL  TECHNICAL  REPORT 


1.  SUMMARY 


1.1  PROJECT  OBJECTIVES  AND  ACCOMPLISHMENTS 

The  purpose  of  the  this  project  is  to  extend  a  deductive  object  base  with  knowledge-based  prob¬ 
lem  solving  and  planning,  which  is  intended  to  realize  the  concept  of  very-high  level  programming  in 
a  database  system.  The  input  to  such  a  system  is  a  specification  of  the  problem  to  be  solved  (as  a  set 
of  goals)  and  the  output  is  a  solution  of  the  problem,  where  the  knowledge-based  problem  solving 
system  deals  with  problems  that  do  not  change  the  state  of  a  database  and  the  planning  system 
processes  goals  that  require  some  state  changes  in  the  database. 

In  our  approach,  the  knowledge-based  problem  solving  system  stores  a  set  of  problem  models 
(such  as  graph  problems)  so  that  an  input  problem  can  be  matched  through  an  object-oriented 
specialization/generalization  process.  If  no  problem  models  can  be  matched  by  a  given  problem,  the 
user  should  be  provided  with  a  high-level  programming  system  that  allows  a  top-down  problem  solv¬ 
ing  process  be  carried  out  until  some  matches  can  be  found  at  detailed  implementation  stages. 

For  the  planning  system,  we  have  realized  that  most  of  the  conventional  ^proaches  based  on  the 
formulation  of  operations-preconditions-postconditions  have  been  proved  to  be  inefficient.  We  have 
classified  general  planning  problems  into  several  classes  so  that  each  class  can  be  solved  individually 
and  efficiently.  With  this  approach,  each  class  of  planning  problems  can  be  constructed  as  a  problem 
model  and  included  in  the  general  problem  solving  system. 

Our  accomplishments  can  be  summarized  as  foUows: 

1.  Within  an  object-oriented  knowledge  base  framewoik,  we  have  developed  the  necessary  con¬ 
structs  to  define  problem  models.  Matching  between  a  given  problem  and  problem  models  is 
accomplished  through  theorem  proving. 

2.  We  have  developed  an  object-oriented  knowledge  base  programming  environment  in  which 
object-oriented  programs  can  be  specified  and  executed  easily  and  efficiently. 

3.  We  have  designed  an  object-oriented  plarming  system  that  employs  a  high-level  query  language 
as  the  goal  specification  language.  We  have  classified  planning  problems  according  to  different 
constraints.  With  such,  we  can  concentrate  on  classes  of  planning  problems  that  allow  efficient 
solutions.  These  problems  include  simple  operator  ordering  problems,  consumer  ordering  prob¬ 
lems,  producer  ordering  problems,  and  consumer-producer  ordering  problems. 

4.  We  have  found  that  most  planning  problems  that  require  sequencing  of  operations  that  compete 
for  space  belong  to  the  above  classes  of  planning  problems.  Consequently,  the  previous 
approaches  that  universally  employ  the  pre-ccmdition/post-cotKlition  formulaticm  is  neither 
appropriate  nor  necessary,  and  efficient  solutions  can  be  developed  with  algorithmic  approaches. 


This  report  summarizes  the  theories  and  algorithms  devised  for  items  1-3  listed  above.  Publica¬ 
tions  for  item  4  are  included  in  the  appendix. 
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2.  PROGRAM  TRANSFORMATION 


Programming  activities  are  knowledge-intensive,  where  extensive  knowledge  of  application 
domains  and  programming  languages  is  required.  Even  though  there  have  been  numerous  approaches 
in  the  held  of  program  transformation  and  verihcation  [PaSt83]  [Feat86]  [MiIX}86],  their  practical 
utility  is  still  limited.  This  is  partially  due  to  the  inability  to  properly  manage  the  large  amount  of 
knowledge  from  different  domains.  For  this  reason,  the  role  of  knowledge  management  for  various 
problems  in  software  engineering  is  getting  more  attention  from  researchers  [Bars87].  Recently, 
several  software  engineering  envirorunents  have  been  designed  with  strong  supports  from  knowledge 
bases  or  data  bases  [Pene86]  [Qem88]  [Estu86]  [HuKi88]  [SmKW85]. 

This  section  describes  a  Knowledge-Based  Program  Transformation  System  (KBPTS)  that  has 
been  designed  on  top  of  an  object-oriented  knowledge  base  for  the  purpose  of  automatic  program 
transformation  and  optimization.  In  KBPTS,  a  program  can  be  specihed  by  means  of  a  variation  of 
C++  which  allows  object  classes  and  their  associated  methods  be  iimctionally  specihed  before  imple¬ 
mented.  In  the  object-oriented  knowledge  base,  a  collection  of  abstract  algorithms  are  stored  as  a 
library  of  algorithms.  Like  application  programs,  the  hmctionality  of  each  abstract  algorithm  is  given 
in  additional  to  the  implementation.  A  program  in  KBPTS  is  developed  by  hrst  specifying  its  func¬ 
tionality.  The  transformation  system  then  searches  for  an  abstract  algorithm  whose  fimctionality  can 
match  that  of  the  program.  If  the  search  succeeds,  the  program  is  replaced  by  the  implementation  of 
the  abstract  algorithm  (with  proper  instantiations),  which  is  supposed  to  be  efficient 


2.1  RELATED  WORK 

Work  related  to  the  programming  system  described  in  this  report,  can  be  classified  into  five 
categories;  implementation  of  sets  in  object-oriented  programming,  object-oriented  databases,  program 
transfoimation,  software  reuse,  and  knowledge-based  editors. 

Implementation  of  Sets  in  Object-Oriented  Languages 

Among  existing  object-oriented  programming  languages.  Smalltalk  [GoAR89]  may  have  the 
most  object-oriented  implementation  of  sets.  Since  sets,  as  well  as  bags,  arrays,  dictionaries,  sorted 
collections,  and  others,  are  subclasses  of  the  class  collection,  every  instance  of  such  classes  is  an 
independent  object  containing  other  objects.  Depending  on  the  class,  various  methods  are  available; 
some  of  which  are  methods  to  count,  add,  delete,  copy,  replace,  or  sort  elements.  For  some  classes, 
elements  need  not  all  be  of  the  same  type. 

The  influence  of  Smalltalk  is  evident  in  two  languages  [BGGH91].  The  first.  Objective  C,  is  a 
superset  of  C.  In  addition  to  arrays  and  structures  which  are  a  part  of  C,  it  supports  sets,  dictionaries, 
and  ordered  collections  which  are  based  on  the  collection  classes  found  in  Smalltalk.  The  second 
language.  Actor,  also  has  collection  classes  which  are  taken  directly  from  Smalltalk.  Finally,  there  is 
one  very  popular  object-oriented  language,  C++  which,  like  Objective  C,  is  a  superset  of  C.  There  is 
no  implementation  of  sets  or  any  collection  classes  in  C++  other  than  the  basic  C  arrays  and  stmc- 
tures.  In  [Koen92].  several  aspects  of  designing  a  container  or  collection  class  were  considered  in 
extending  C++. 
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Despite  that  the  provision  of  sets  is  an  important  feature  of  a  declarative  programming  language, 
to  our  knowledge  very  few  of  object-oriented  programming  languages  have  listed  declarative  program¬ 
ming  as  its  targeted  goal;  this  is  reflected  by  the  fact  that  no  further  speciflcation  constructs  such  as 
variable  quantifiers  have  been  provided  in  such  languages  beyond  sets. 

Object-Oriented  Databases 

Several  object-oriented  databases  have  been  proposed:  a  partial  list  includes  GEMSTONE 
[MaSt86][CoMa84].  Iris  [Fish87].  Ariel  [Macg85].  EXODUS  [Carc86].  TreUis/Owl  [Obri86]  and 
POSTGRES  [RoSt87]  (StHH87].  Most  of  these  systems  have  been  designed  to  simulate  semantic  data 
models  by  including  mechaiusms  such  as  abstract  data  types,  procedural  attributes,  rules,  inheritance, 
union  type  attributes  and  shared  subobjects.  Declarative  programming  in  such  systems  is  confined  to 
be  declarative  retrieval  of  persistent  objects. 

Most  recently,  a  number  of  object-oriented  databases  have  been  available  commercially;  some 
examples  are  Versant,  02,  and  ObjectStore  [CACM91].  Instead  of  providing  a  query  language,  most 
of  them  require  persistent  objects  be  accessed  directly  from  within  an  object-oriented  program.  Most 
of  the  above  systems  have  been  implemented  with  an  extended  language  compiler  and  a  separate  data¬ 
base  system  so  that  accesses  to  persistent  objects/data  are  implemented  by  an  object  storage  system. 
To  our  knowledge,  little  consideration  has  been  given  to  global  optimization.  On  the  other  hand, 
although  the  lack  of  a  query  language  can  eliminate  the  gap  between  a  database  system  and  a  pro¬ 
gramming  language,  it  also  causes  declarative  programming  an  even  more  remote  goal  of  such  sys¬ 
tems. 

Program  Transformation 

Program  transformation  includes  predefined  transformations  (e.g.,  rewriting  rules)  and  program 
constructions  from  a  high  level  nonexecutable  specifications  to  a  low  level  executable  form.  Existing 
transformation  systems  can  be  divided  into  two  classes:  those  that  perform  transformations  automati¬ 
cally  and  those  which  are  guided  by  users.  The  CEP  project  [CIP84]  [BMPP89)  focused  on 
correctness-preserving  and  source-to-source  program  transformation  at  different  levels  of  abstraction. 
The  development  process  is  guided  by  the  programmer  who  has  to  choose  appropriate  transformation 
rules.  The  user  guidance  accomplishes  the  creative  part  in  the  development  process.  DEDALUS 
[MaWa79]  and  KBSA  [PrSm88]  attempted  to  automate  the  transformation  selection  process. 
DEDALUS  was  able  to  create  a  program,  a  correctness  proof,  and  a  proof  of  termination  for  programs 
of  a  limited  scope.  DEDALUS  selects  candidate  rules  by  pattern-directed  invocations  and  applies 
those  rules  sequentially.  KBSA  focused  on  automatic  algorithm  design,  deductive  inference,  finite 
differencing,  and  data  structure  selection.  Given  a  problem  description,  KBSA  generates  an  optimal 
program  through  correctness-preserving  transformations.  One  of  the  major  problems  with  existing 
automatic  program  transformation  systems  is  that  most  of  them  try  to  transform  a  program  from 
scratch;  consequently  the  lack  of  driving  force  of  a  design  process  can  only  lead  to  limited  successes 
in  practical  applications.  Even  though  some  search  approaches  such  as  cost  functions  and  efficient 
search  methods  have  been  employed,  global  strategies  have  yet  been  integrated  effectively  [Scha90]. 

Software  Reuse 


Software  reuse  appears  in  two  levels  of  abstraction;  reuse  at  the  code  level  and  reuse  at  the 
specification  level  [Dill88].  While  code-level  reuse  involves  modifying  existing  code  [PrFr87], 
specification-level  reuse  is  based  on  an  external,  often  fonnal,  program  specification.  Existing  metho¬ 
dologies  include  program  transfonnation  [Qiea84]  [BoMu84]  and  software  library 
[BABK87][WoSo88][NTFT9l].  Program  transfonnations  are  used  to  refine  a  specification  or  an 
abstract  program  defined  in  a  very  high  level  language  into  a  program  written  in  a  target  language 
[KaGa87].  Software  libraries  require  the  ability  to  locate  the  appropriate  software  components  based 
on  users’  requests.  Most  of  such  systems  employ  certain  kind  of  indexing  techniques  or  syntactic 
matching  for  the  purpose  of  searching;  and  a  more  flexole  and  efficient  matching  mechanism  remains 
to  be  developed  [DiUSS]. 

Knowledge-Based  Editors 


Simple  program  editors  have  been  extended  to  be  more  powerful  ones.  Some  incoiporate  an 
understanding  of  the  syntactic  structure  of  the  program  being  edited  [MeFe81]  [TeReSl].  This  makes 
it  possible  to  support  operations  based  on  the  parse  tree  of  a  program  (e.g.,  inserting,  deleting,  and 
moving  between  nodes  in  the  parse  tree).  Syntax-based  editors  also  ensure  the  syntactic  correctness 
of  the  program  being  edited.  KBEmacs  [Wate85]  extended  program  editors  further  by  including  an 
understanding  of  the  algorithm  structure  of  the  program.  By  comparing  the  algorithm  structures  with 
programming  cliches,  which  are  standard  models  of  solving  progranuning  problems,  KBEmacs  can 
intelligently  assist  programmers.  KBEmacs  assists  programmers  to  construct  programs  more  rapidly 
and  more  reliably  by  combining  or  modifying  existing  algorithmic  cliches.  The  idea  of  using  algo¬ 
rithmic  cliches  is  similar  to  that  of  using  abstract  algorithms  in  KBFTS.  One  difterence  is  that  cliches 
are  domain  dependent  reusable  components  while  abstract  algorithms  are  general  ones  which  can  be 
applied  to  problems  in  various  domains. 


22  DECLARATIVE  OBJECT-ORIENTED  PROGRAMMING 

For  the  purpose  of  declarative  programming,  KBPT  extends  the  C+-k  progranuning  language 
with  the  constructs  for  ftmctional  specifications  wd  associative  programming,  which  are  described  in 
the  following  subsections. 


12.1  FUNCTIONAL  SPECIHCATIONS 

IN  KBPT,  functional  specifications  are  accomplished  with  the  declaration  of  sets  and  the  availa¬ 
bility  of  quantifiers  for  logical  expressions: 

Set  Classes 


Given  a  class  a,  the  class  of  all  possible  ordered  sets  which  can  be  derived  from  instances  of  a  is 
declared  as; 

class  setjofja.  { 


Hmethods 
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} 

The  following  declaration  defines  a  set  a  of  class  a: 
setjofjx  a; 

Set  Projection 

Given  a  set  or  an  object  a  of  class  a,  the  following  notation  designates  the  projection  of  a  on  attri¬ 
butes  ^41 . An: 

Head  and  Tail 

The  function  headO  applying  to  a  set  returns  the  first  element  of  the  set;  the  function  tailO  returns  th^; 

remaining  of  the  set  The  symbol  NIL  designates  the  empty  set. 

Universal  Quantifier 

A  variable  in  a  logical  expression  can  be  universally  quantified  by  the  quantifier; 

(forall  <variablejd>  in  <setjd>) 

Existential  Quantifier 

A  variable  in  a  logical  expression  can  be  existentially  quantified  by  the  quantifier 
(exist  <variablejd>  in  <setjd>) 

Membership 

The  following  fimction  returns  1  if  <variablejd>  is  an  element  of  <setjd>: 
<set_id>:member(<variablejd>); 


□ 

Based  on  the  above,  a  class  declaration  can  define  the  functionality  of  each  method  and  any  logi¬ 
cal  property  of  its  instances  in  additional  to  the  structure  of  the  class.  The  general  form  of  a  class 
declaration  is: 


-7- 


class  <classjd>  { 

<classjd>  ll<variable_id>J  <methodJd>  (parameter _1 4omain  1 . 

parameter _n  :domain_n); 

[<=  logical  expression;] 

[<=  logical  expression;] 

}  [<-  logical  expression] 

[<=  logical  expression] 

In  *he  above,  each  logical  expression  associated  with  a  method,  if  specified,  describes  the  desired  rela¬ 
tionships  among  the  parameters,  the  target  object  (which  is  represented  as  Self^),  and  any  returned 
object  (which  is  represented  as  a  variable  that  is  defined  after  the  returned  class).  Multiple  logical 
expressions  can  be  associated  with  a  method:  each  of  which  designates  an  alternative  functional 
specification.  The  symbol  "+Self+"  designates  the  updated  value  of  Self  in  case  the  method  changes 
the  contents  of  the  object.  On  the  other  hand,  each  logical  expression  associated  with  the  class,  if 
specified,  describes  the  logical  property  of  each  instance.  Multiple  logical  expressions  can  be  associ¬ 
ated  with  a  class:  each  of  which  designates  an  alternative  description  of  the  class.  Such  descriptions 
are  typically  used  to  define  derived  classes  and  can  be  used  in  the  program  transformation  process.  A 
set  class  cannot  be  associated  with  any  logical  expression  (but  its  methods  can)  as  its  properties  are 
defined  by  the  class  it  is  derived  from. 

Example  2-1 

The  following  declarations  define  the  classes  for  an  airline  reservation  system. 

class  city  { 
public: 

llattributes  and  methods 


); 

class  set_ofj:ity  {...}; 

class  flight  { 
public: 

city  source,  destination; 
float  fare; 

void  acUflare(int  amount); 

<=  (■*-Self+  =  Self  amount) 
H others 


Note  that  Self  and  self  are  different;  the  latter  is  a  pointer  in 
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class  setjofjlight  {  ...  },• 

class  airline  { 
public: 
set_ofjcity  cs; 
setjff Jlightfs; 

int  connection(city  s,city  tjetjof  Jlight  c  float  fare); 

<~  Self.cs.member(s)  <6  <6 
Self.cs.membeiit)  && 

(c.head(). source  ==  s)  && 

Self:connection(c. tail( ).destination,tfarel ))  && 

(fare  =  c.headOfare  +  farel)) 

<-  Self.cs.member(s)  && 

Self.cs.member(t)  && 

(c.head(). source  s)  <6<fe  (c.head().destination  ==  t)  <&& 

(c.tailO  ==  NIL)  &&  (fare  ==  c.head()fare) 

setjof  Jlight  Id  cheapest  jconnection( city  s.city  t  float  fare); 

<=  (connection(s,t4fare)  ==  1)  && 

!( (exist  c  in  Selffs)  (connection(sjt.c farel)  && 

(farel  <  fare)) 

//others 

) 


□ 


Example  2-2 

The  following  declarations  define  a  class  father  and  a  derived  class  ancestor, 
class  father  { 

char  father[30] ,  child/ 30/; 

//others 


}-• 


class  ancestor  { 

char  ancestor/301,  descendant/301; 
//others 
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}.  <=  (exist  s  in  father)  (exist  u  in  ancestor)  ((s  father  ==  Selfancestor)  && 
(s. child  ==  u.ancestor)  &&  (u.descendant  ==  Self.descendant)) 

<=  (exist  s  in  father)  ((s  father  ==  Self. ancestor)  && 

(s. child  ==  Se!^  descendant) 


□ 


222  ASSOCIATIVE  PROGRAMMING 

The  availability  of  sets  as  described  in  the  last  subsection  allows  objects  be  retrieved  in  an  asso¬ 
ciative  fashion.  The  following  functions/statements  are  used  to  access  the  elements  in  a  set: 

1 .  <set_id>:insert(<variablejd>); 

2.  <setjd>:delete(<variable_id>); 

3.  (foreach  <variablejd>  in  <setjd>)  statement: 

Example  2-3 

Assume  the  following  declarations: 

class  rectangle  { 
public: 

vertex  ajb,c4: 

int  intersect( rectangle  r):  //test  if  two  rectangles  intersect 
int  sized:  //returns  the  size  of  a  rectangle 
void  plot/):  //plot  a  rectangle 
//others 

}/ 

class  vertex  { 
public: 
float  x.y: 

//others 


class  block  { 

//attributes 

void  plot():  //plot  a  block 
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}.• 

class  onjop  { 
public: 

block  topjbottom; 
Uothers 

1; 


class  setjofjblock 
class  set_of_on_top  { ... } ; 
class  set_of_rectangle  {...},• 

set_of_block  sb; 
set_of_onjop  sot; 
setj)f_rectangle  sr; 
block  b; 
opjop  a: 
rectangle  s,t,u; 

The  following  are  some  example  statements  which  access  objects  associatively: 

Hplot  pairs  of  rectangles  of  sr  which  intersect  each  other 
(foreach  t  in  sr) 

(for  each  u  in  sr) 

if  (t.intersect(u)  ==  1)  (t.plot();  u.plot();] 

Hplot  the  smallest  rectangle  in  sr 
(foreach  t  in  sr) 

if  .'((exist  s  in  sr)  (s.size()  >  t.size())  t.plot(); 

Hplot  each  block  which  does  not  support  any  other  block 
(foreach  b  in  sb) 

if  .'((exist  a  in  sot)  (a.bottom  ==  b))  b.plot(): 


□ 


2 J  OBJECT-ORIENTED  LOGIC  SYSTEM 

A  set  of  class  definitions  as  described  in  Section  3  can  be  translated  into  an  object-oriented  logic 
system.  Formally,  we  define  an  object-oriented  logic  system  to  be  a  two-level  system.  The  first  level, 
or  the  object  level,  is  a  tuple  Lq  =  (0,GoJDo),  where  O  is  a  first  order  object  language,  Gq  is  an 
object  representation  of  O ,  and  DLq  is  a  set  of  deductive  laws.  Similarly,  the  second  level,  or  the 
schema  level,  is  a  tuple  Z.5  =  iS,GsJ)s)'  where  S  is  a  first  order  object  language,  Gs  is  an  object 
representation  of  S ,  and  DLs  is  a  set  of  deductive  laws. 
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Consider  an  object  base  Gq  .  namely  a  set  of  classes  and  their  associated  methods.  We  define  the 
first  order  schema  language  consisting  of:  a  set  of  constants  (which  are  started  with  an  upper  case 
letter),  a  set  of  variables  (written  in  upper  case  letters),  and  the  following  predicates  (in  lower  case)  to 
describe  object  classes  and  relations: 

1.  c/ass(a.ai,...,a„)  is  true  if  a  is  the  name  of  a  class  of  objects,  and  the  attributes  of  each  object 

of  class  a  is  The  symbol  set_ofji  designates  the  class  of  all  possible  ordered  sets 

which  can  be  derived  from  the  objects  in  class  a . 

2.  a  \method{m4\ . d„)  is  true  if  a  s  a  class,  m  is  a  method,  and  the  domain  of  the  /'*  parame¬ 

ter  of  the  method  is  </, . 

3.  attribute(a,b,c)  is  true  if  the  attribute  b  of  class  a  has  the  the  domain  c,  where  c  js  a  set. 

4.  Ihe  predicate  instance _of(at>)  is  true  if  object  a  is  an  instance  of  class  b\  the  predicate 
member _ofla,b)  is  true  if  object  a  is  an  instance  of  set  b . 

Similarly,  we  define  the  first  order  object  language  consisting  of:  a  set  of  constants  (written  in  lower 
case  letters),  a  set  of  variables  (written  in  upper  case  letters),  an  n-place  predicate  symbol  m  for  each 
of  the  n-ary  method  m  (For  simplicity,  we  shall  assumed  that  all  method  names  are  distinct)  and  the 
following  predicates  (in  lower  case)  to  describe  objects  and  relationship  among  a  set  of  objects: 

1.  The  predicate  a.m(jc,....j:^)  is  true  if  the  method  m  of  some  class  is  applied  to  the  object  a  of 

the  same  class  with  the  arguments  of  legal  values. 

2.  The  predicate  a(t)  is  true  if  r  is  an  instance  of  class  a .  The  predicate  set_of_a(t)  is  true  if  t  is 
an  instance  of  the  class  set_of_a  (i.e.,  r  is  a  set  whose  elements  are  of  class  a). 

3.  The  predicate  member _ofia,b)  is  true  if  the  object  a  is  an  element  of  the  set  h.  The  notation 
[H\r],  where  H  and  T  are  variables  or  constants,  designates  a  set  whose  first  element  is  H  and 
the  rest  of  the  set  is  T. 

Finally,  following  the  syntax  and  semantics  of  PROLOG,  both  a  schema-level  deductive  law  and  an 
object-level  deductive  law  are  expressed  in  the  form  of; 

/  :-/i./2.  -./n- 

In  both  cases,  /  is  called  a  derived  predicate .  If  a  perdicate  /  is  defined  in  tenns  of: 

/  e^. 

f  e„. 
its  semantics  is 

/  <=>  e,  lea'...  e„. 

If  m  =  1,  this  shall  result  in  /  <=>  Cj. 
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For  simfdicity.  from  now  on  we  shall  use  the  notation 

class(aA\.d\,...xin  -dn ) 

in  place  of  the  set  of  predicates; 

classiaa  . . a„) 

attribute(a,  a^4\) 

attribute(a,  a„  ) 

Example  2-4 

Suppose  we  have  an  object  class  called  city  with  only  one  attribute,  state ,  whose  domain  is  string , 
and  an  object  class  called  flight  with  the  following  attributes: 

1.  source .  whose  domain  is  city; 

2.  desn'/uin'o/t,  whose  domain  is  dry: 

3.  fare,  whose  domain  is  float; 

Also  assume  that  we  have  a  class  called  airline  with  the  following  attributes: 

1.  cs,  whose  domain  is  setjofjcity; 

2.  /j,  whose  domain  is  wr_o/_//rg/jr: 

Associated  with  the  class  airline ,  assume  we  have  a  method  call  connection  which  takes  two  cities  as 
the  input  and  returns  a  set  of  flights  which  connect  tlK  two  cities.  The  structure  of  this  system  can  be 
described  as  follows,  where  expressions  at  schema  level  and  expressions  at  object  level  are  separated 
by  a  line.  The  same  convention  will  be  followed  in  the  remaining  of  the  report 

class(city,state:  string) 

class(flightjsource:city4estination:cityfare:float) 
class(airline,cs:set_pfj:ityfs:set_ofJlight) 
airline:method{connection,set_ofJlight,city.city  float) 
airline:method(cheapestJare,city,city  float) 

airline(A) instance_offA.cs,set_of_city).  insmnce_of[Afsjet_af  flight). 


A:connection(CflJ flare) 
member_oflS,A.cs),  member jjflTfl.cs). 
member _qf(FA.fs),  (Fsource  =  S),  (F. destination  =  T), 
(C  -  [FJ),  (Fare  =  F.fare). 

A:connection(C S.T flare) 
member_oflSA.cs),  member _of(TA.cs). 
member _of(FA.fs),  (Fjource  =  S). 

A  :connection(C]  fl. destination  ,T flare  1 ). 

C  -  [flClJ.  (Fare  -  Ffare  +  Farel). 
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A  -.cheapest j:onnection(DJS,T fare)  :• 

A  :connection(DSJfare), 
~(A:connection(CS,Tfarel),  (Farel  <  Fare)) 


□ 

Similar  to  the  two-level  structure  described  above,  given  a  class  declaration  presented  in  the  general 
form,  we  say  a  logical  expression  is  presented  at  object  level  if  it  is  associated  with  a  method  aiKl  we 
say  a  logical  expression  is  presented  at  schema  level  if  it  is  associated  with  a  class.  The  class  declara¬ 
tion  can  be  translated  into  an  object-oriented  logic  system  according  to  the  following  rules; 

Quantifiers 

A  logical  expression  which  is  presented  at  object  level  and  is  existentially  qualified  in  the  form  of 
{exist  t  in  c)  d  is  translated  to  c{t),d  (if  c  is  a  class)  or  member j>f  (r,c),  d  (if  c  is  a  set).  A  logical 
expression  which  is  presented  at  object  level  and  is  universally  qualified  in  the  foim  of  {forall  t  in  c) 
d  is  translated  to  ~{c{t),  'd)  (if  c  is  a  class)  or  ~{member_of  {t,c),'d)  (if  c  is  a  set). 

For  the  schema  level,  the  same  rules  as  described  above  apply  except  any  expression  of  the  form  a{a ) 
in  the  translated  expression  is  replaced  by  instance _of  {a  si)  and  any  expression  of  the  ~  form 
setjcfjoda)  is  replaced  by  instance yfa^etjjfjj). 

Structures 


A  class  definition  presented  in  the  form  of 
class  a  { 

domain_I  attribute _1; 

domain  _1  attribute _1; 

Hmethods 
}  <=  c_l 

<=  c_n 

is  translated  to  the  following  at  the  schema  level: 

class(asutribute_l ....  stttributeji)^ 
attribute(a,attribute_l  4omain_l ) 

attribute(a,attributeji4omain_n) 
a(Self) ;-  8,  c_i; 


2 


If  an  attribute  is  presented  in  iqtper  case,  it  should  be  converted  to  lower  case. 
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a(Self) 6.  cji; 

where  5  =  instance _of( attribute _1  ,domain_l ),  ....  instance _of(attribute _n, domain _n).  In  case  no  cj  is 
specified,  it  is  translated  to: 

class( a  .attribute _1 attribute _n) 

attributeia, attribute _1  Momain_l ) 

attribute(a,attribute_nxiomain_n) 
a(Self)  6. 

Set  Classes 


A  set  class  definition  presented  in  the  form  of 

class  set_of_a  { 
llmethods 

1 

is  translated  to  the  following  at  schema  level: 

class(set_of_a) 

Methods 

A  method  definition  presented  in  the  forni  of 

(Us  c:m(parameter_I  :domain_l ....  4)arameter_n:domain_n) 
<=  c_l 

<=  c_n 
is  translated  to: 

c:method(m,d4omainJ ....  4omain_n)^ 

Setf:mls4>arameter_l,... parameter ji)  :•  6.  c_I. 
Self:m(sparameter_l,. .. .parameter ji)  5,  cji. 


If  an  attribute  is  presented  in  upper  case,  it  should  be  converted  to  lower  case.  The  variable  Setf  can  be  replaced  by 
any  variable  which  is  distinct  from  the  others.  In  this  case  all  instances  of  Self  in  C_  1 . C_n  should  be  replaced. 
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where  8  =  domain_l( parameter _I),...4omain_n(parameter_n).  Note  that  the  convention  used  in  this 
report  is  that  after  translation,  the  first  parameter  of  a  method  predicate  corresponds  to  Self,  and  the 
second  parameter  corresponds  to  thw  returned  object  (if  specified). 

Sets 

If  both  functions  head  and  tail  appear  in  a  logical  expression  and  are  applied  to  the  same  object  a  of 
class  set_of_a  in  the  form  of  a. head 0  and  a. tail 0  object  level,  „!1  instances  of  a. head 0  are 
replaced  by  a  variable  T  and  all  instances  of  a. tail 0  are  replaced  by  a  variable  5,  where  S  and  T  are 
two  variables  which  are  distinct  from  the  others,  bi  the  mean  time,  the  following  expression  should  be 
added  to  the  translated  expression: 

0(7),  setjafjdS).  A  =  [T^] 

If  only  head  is  applied  to  an  object  a  of  class  set_of_a  at  object  level,  all  instances  of  a. head 0  are 
translated  to  a  variable  7,  where  7  is  a  variable  which  is  distinct  from  the  others.  In  the  mean  time, 
the  predicate  a(7)  should  be  added  to  the  translated  expression.  On  the  other  hand,  if  only  tail  is 
applied  to  an  object  a  of  class  set_ofjx  at  object  level,  all  instances  of  a  .tail  ()  are  translated  to  a 
variable  S ,  where  5  is  a  variable  which  is  distinct  from  the  others.  In  the  mean  time,  the  predicate 
setjafjcdS )  should  be  added  to  the  translated  expression. 

For  the  schema  level,  the  same  rules  as  described  above  apply  except  any  expression  of  the  form  aia ) 
in  the  translated  expression  is  replaced  by  insuincej>f  (a  .a)  and  any  expression  of  the  form 
setjofjaia )  is  replaced  by  instance jjfia  ,setjafja.). 

Membership 

A  function  call  of  the  form  a:member(b)  at  both  levels  is  translated  to  member _of(b.a). 


□ 


Example  2-5 

With  the  above  rules,  the  declarations  made  in  Example  2-1  can  be  translated  to  the  declarations 
presented  in  Example  2-4. 


□ 


In  addition,  the  expression  8  will  be  omitted  for  the  examples  for  clarity. 
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2.4  OBJECT  UNinCATlON 

The  presence  of  variables  and  constants  at  object  level  which  are  structured  objects  makes 
unification  at  that  level  a  rather  complicated  task.  At  the  first  glance,  given  a  predicate  c{A)  and 
assuming  the  structure  of  c  has  been  declared  as  classic, a it  can  be  translated  to  the 
following  set  of  predicates: 

c(A) 

attribute _yalue(A/i\  A 
attribute _yalue( A  ja„  A„ ) 

where  a  predicate  attribute_value(a,b,c)  is  true  if  object  a  has  c  to  be  the  value  of  its  attribute  b . 
Now.  any  object  expressed  as  A.aj.  1  <  j  ^  n.  can  be  translated  to  Aj.  The  same  rule  can  be  applied 
recursively  if  any  of  the  Aj 's  is  a  structured  object.  This  mechanism  seems  to  work  well  if  the  type  of 
A  is  known  exactly.  However,  if  c  is  object  (which  means  a  can  essentially  be  any  type  of  object)  or 
some  unknown  attribute  of  A  is  referenced  (in  the  form  of  A. B,  for  example,  where  B  is  a  variable), 
the  above  mechanism  would  be  stacked.  In  the  following,  we  shall  extend  the  conventional  unification 
algorithm  in  order  to  handle  structured  objects  in  general.  Before  proceeding,  let  us  recall  that  the 
disagreement  set  of  a  nonempty  set  W  of  expressions  is  obtained  by  "locating  the  first  symbol 
(counting  from  left)  at  which  not  all  the  expressions  in  W  have  exactly  the  same  symbol,  and  then 
extracting  from  each  expression  in  W  the  subexpression  that  begins  with  the  symbol  occupying  at  that 
position"  [CM.e69]. 

The  unification  algorithm  is  extended  to  include  structured  objects  as  follows; 

Object-Oriented  Unification  Algorithm 
step  0 

Retrieve  the  types  of  each  expression  if  known. 
step  1 

1:  =  0.  W*  =  W,  a*  =  <|i.  P  =  (t>; 
step  2 

IfVV*  is  a  singleton,  stop  with  success  and  return  Of  Otherwise,  find  the  disagreement  set  of  Wi. 
step  3 

If  there  exist  elements  u  and  v  in  ZJ*.  consider  the  following: 

1.  If  both  u  and  v  are  predicate  symbols,  u  and  v  cannot  be  unified  (as  they  are  different)  and 
stop  with  failure. 

2.  If  «  =  A^Aj-.A^  and  v  *  where  each  Aj,  l^i^,  or  Bj,  1<J^,  is  a  constant  or  a 

variable; 

2-a  If  u  and  v  caruiot  be  of  the  same  type  with  a  unifier  or  with  a  unifier  which  has  not  been 
applied  before  and  backtracking  is  possible,  backtrack  to  the  previous  decision  point; 
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otherwise  stop  with  failure. 

2-b  If  u  and  v  can  be  of  the  same  type  with  a  uniher  5  which  has  not  been  applied  before, 
add  this  step  as  a  decision  point.  Let  5  =  {(w  •  5  )/m,  (v  •  8  )/v }.  Also  let  (3  =  p  u  {(6  • 

u)/u,  (5  •  v)/v }.  If  at  this  point  there  exists  a  set  of  unifiers  of  the  form  {wi/vi . w^/yr }, 

where  each  w,  is  of  the  form  Dx£>2..S>q  Ti  for  which  T,  is  a  constant  or  a  variable  and 
each  yj  is  of  the  form  C  for  which  S,  is  a  constant  or  a  variable,  consider  the 

following.  If  {ri,...,?;. }  covers  all  the  attributes  of  and  {5i . Sr]  covers  all  the 

attributes  of  Ci...Cp,  then  add  Di...D^/Ci...Cp  to  8.  If  {5, ,...,5;. }  covers  all  the  attributes 

of  Ci....Cp  but  {Ti . Tr)  does  not  cover  all  the  attributes  of  then  add 

Dfi . jrtC \...Cp  to  8.  Otherwise  go  to  step  4. 

step  4 

Let  =  a*  •  8,  =  W*  •  8. 


step  5 

it  =  /:  -t-  1  and  go  to  step  2. 


□ 


Example  2-6 

Consider  the  following  two  expressions,  assuming  airline  (G)  and  airline  (A),  where  the  class  airline 
is  defined  as  in  Example  2-1  and  translated  as  in  Example  2-4: 

W  =  [p  iG£S  ,G  JV5  ,G ).  p  (A.fs  A.cs  A  ) } 

According  to  the  extended  unification  algorithm,  initially  P  is  The  unifier  8  =  {A/G  ,fs/ES }  unifies 
G.ES  and  A.fs.  Ut  8  =  {(G,ES  •  ^)/E.GS.  {A.fs  •  B’)/A.fs }  =  {A.fs IG£S  A.fs lA.fs }.  Also  set  p 
=  P  u  {A.fs /G£S  A.fs lA.fs]  =  {A.fs IG£S  A.fs lA.fs).  At  the  end  of  the  first  iteration,  - 
{piA.fs  ,G7V5  ,G ),  p (A.fs  A.cs )}.  Similarly,  a  unifier  for  the  second  argument  can  be  obtained  as 
{A.cs/GJ>/S  A.CS/A.CS]  and  P  becomes  {A.fs  IG.ES  A.fs /A.fs ,  A.cs /G. NS  A.cs /A.cs).  At  this  point 
A.fs  and  A.cs  cover  all  the  attributes  of  A  and  {G£S  ,G.NS  }  covers  all  the  attributes  of  G  (based  on 
their  types),  so  that  the  unifier  {A/G  }  is  added,  arxl  the  resulting  set  of  unifiers  is  returned  success¬ 
fully. 


□ 


2£  DEDUCTIVE  UNIFICATION 

A  simple  solution  to  the  problem  of  mapping  abstract  algorithms  to  application  algorithms  can 
be  proceeded  as  follows: 
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1.  Construct  abstract  object  classes  and  their  associated  methods  in  the  object  level. 

2.  Compare  the  application  with  the  abstract  classes  and  their  associated  methods;  if  a  match  can  be 
identified,  those  abstract  algorithms  whose  functionalities  can  be  matched  are  instantiated. 


We  consider  a  library  of  algorithms  as  a  collection  of  useful  methods.  Such  algorithms  should  be 
as  abstract  as  possible  so  that  they  can  be  instantiated  by  most  applications.  As  an  example,  we  can 
define  the  abstract  class  w_graph  (weighted_graph)  and  some  methods  which  implement  efficient 
graph-based  algorithms  as  follows^: 

Graph-Based  Classes  and  Methods 

class(node) 

class(edge,vl :  node, v2:  node, w:float) 
class(w_graph,ns:set_of_node,es:setj)f_edge) 
w_graph:method{w _path,set_of_edge, node, node  float) 


G:w jjathf P :set _of_edge A: node fl: node, W float) 
member_of(E,G.es),  (E.vl  =  A),  (E.v2  =  B),  (P  -  [EJ),  (W  =  E.w). 

G:w _path(P:set_of_edgeA:nodeS:node,W float) 
member _of(E,G.es),  (E.vl  *  A), 

G:w j>ath(PlE.v2fl,Wl),  (W  =  E.w  +  Wl),  (P  =  [EPl]) 

G:shortest j>ath(P :setj)f_edgeA:nodeS:node,W float) 

G.-w _path(PAfl,W), 

’(G:w _path(PlA3.WI),  (Wl  <  W)) 

If  we  compare  the  functionality  of  the  method  shortest _path  and  the  functionality  of  the  method 
cheapest_connection ,  we  can  find  the  following  terms  which  syntactically  correspond  to  each  other: 

vertex(V)  city(C) 

edge(E)  flighdF) 

w_graph(G)  airline(A) 

G  :path(PA>B,W )  A  :connection(  C,S,T E) 

G.shortest j)ath(PAfl,W)  A:cheapest_connection(C  JS.TE) 

In  conventional  unification  algorithms,  two  predicates  which  have  different  predicate  heads  cannot  be 
unified.  However,  we  know  that  the  shortest j>ath  algorithm  in  class  wjraph  can  be  used  for 
finding  the  cheapest  connection  in  the  class  airline  by  prt^rly  instantiating  the  variables  in  the 
shortest _path  algorithm  with  those  in  the  airline  reservation  system  with  the  following  substitutions: 


In  the  remaining  of  the  report,  for  clarity,  we  shall  use  the  notation  c:in(pl:dl,...j)n:dn)  in  place  of  c:m(pl,  ..pn) 
when  the  functionality  of  the  method  is  given. 
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Variables:  [AIGA.csIG.nsA-fs/G.es.SlAJIBflE.CIPf'arelWf.fare/E.w] 

Predicates:  {airline/w_graph,connection_graph/w j>ath,cheapest_connection/shortest j>ath] 

This  is  an  example  of  matching  with  analogy  [Gtrb81][Ders86][NTFT91],  and  in  order  to  perform  this 
we  need  an  analogical  unification  process.  This  can  be  accomplished  by  extending  the  object 
unification  algorithm  to  include  predicate  symbols  for  unification.  However,  this  approach  is  clearly 
purely  syntactic.  To  unify  two  programs  analogically,  it  is  required  that  each  program  be  described 
with  the  same  number  of  predicates  and  for  each  predicate,  with  the  same  number  of  arguments.  The 
predicates  used  in  the  application  need  to  be  carefully  designed  so  that  they  can  be  syntactically 
unified  by  those  associated  with  the  abstract  algorithms.  Consequently,  the  user  has  to  memorize  a 
large  number  of  predicates,  and  their  semantics,  in  order  to  reuse  the  abstract  algorithms. 

Another  problem  associated  with  the  above  approach  is  that  the  analogical  unification  algorithm 
only  considers  the  number  of  arguments  when  two  predicates  are  matched;  as  a  consequence  some 
random  substitutions  may  be  produced.  As  an  example,  consider  two  predicates  is_equal_set(S 2) 
and  is_equal_tuple(Ti.T-^,  where  the  former  is  true  if  two  sets  Si  and  S2  are  equal,  and  the  latter  is 
true  if  two  tuples  T  \  and  72  ^  equal.  According  to  the  analogical  unification  algorithm,  they  can  be 
unified.  However,  since  the  argument  domains  for  the  two  predicates  ate  different,  the  algorithm  test¬ 
ing  for  the  equality  of  sets  should  be  fundamentally  different  from  that  for  tuples. 

To  solve  the  problems  with  second-order  urtification,  a  little  more  thought  suggests  that  the 
matching  between  an  application  program  and  an  abstract  program  should  be  done  in  first  order,  this 
implies  we  should  parameterize  the  structure  of  an  abstract  class  and  present  it  as  a  derived  class.  The 
concept  of  parameterization  is  similar  to  that  of  templates  in  C++  [Suo91].  However,  in  order  to 
instantiate  a  template  in  C++,  the  programmer  has  to  be  aware  of  its  existence.  Declaring  it  as  a 
derived  class  can  eliminate  such  a  need,  which  is  the  theme  of  this  research,  so  that  the  association 
between  an  application  and  a  template  can  be  established  transparent  to  the  programmer.  With  such, 
we  can  establish  the  following  principle  of  matching  between  two  methods  P  and  Q : 

If  P  <=>  Q  then  P  can  be  used  to  solve  Q,  and  vice  versa 
Example  2-7 

As  an  example,  we  can  rewrite  the  graph-based  methods  as  follows; 

class(w_graphffS:set_qf_X£S:set_of_Y) 
graph(G)  .- 

instance  j)fiG US ^etjjfJC),  instance j)f(G.ES^et_of_Y), 

attributefYAJC),  attribute(Y^X),  attribute(Y ,C float). 
w_graph:method(w j>athJ*:set_of_Y.S:X,T:X.W:floatE:int) 


G:w j>ath(P:set_of_YS:X.T:X.W:ftoatM:int)  ;- 
member  of(E,G£S),  (EA  -  S),  (E3  =  T). 
(P  =  [e]).  (W  =  E.C),  (L  =  1). 

G.-w j)ath(P:set_of_YS:X,T:X,W:floatJL:int)  ;- 
member _ofiE,G£S),  (EA  =  S), 
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G.-w j>ath(Pl^BJ,Wl,U),  (P  = 

(W  =  E.C  +  Wl).  (L=L1  1). 

G.shortest j>ath(P:setj)f_Y^:X,T:X,W:float^.:int)  ;- 

G.-w j>ath(PJS,T,wZ), 

-('G.-w j)ath(Pl^.T,W]M).  (Wl  <  W)) 

To  make  the  example  more  interesting,  let  us  assume  that  a  connection  between  two  cities  is  restricted 
to  consist  of  either  one  flight  segment  or  two  flight  segments.  In  addition,  we  assume  that  the  method 
connection  is  only  interested  in  computing  the  fare  for  a  connection  between  two  cities.  Note  that 
with  this  the  number  of  arguments  associated  with  the  predicates  w j>ath  and  connection  are  different 

class(cityjtate:string) 

class(flight^ource:city,destination:cityfare:float) 
class(airline,cs:set_ofjcity/s:setjofJlight) 
airline:method(connection,C:set_qfJlightS:city,T:cityfare:float) 
airline  .'Cheapest Jare(S:city,T:cityfare:float) 


A:connection(CJSJJ^are) 

mernberyfiF A.fs),  (F. source  =  S),  (F. destination  =  T), 

(C  =  [F]),  (Fare  =  F.fare). 

A:connection(CS,TJ^are)  ;- 
member j}f(F A-fs),  (F.source  =  S), 
member jjfiH A.fs),  (H.source  =  F. destination). 

(H. destination  =  T),  (Fare  =  F.fare  +  H.fare). 

A:cheapestJare(S,TJ^are)  :• 

A:connection(CJS,TF'are), 

~  (A:connection(CJ  JS.TJF").  (F  <  Fare)). 

We  shall  see  that  the  method  shortest  j>ath  can  be  used  to  solve  the  problem  of  cheapest  Jare : 

1.  The  object  Acsjs  fonns  a  wjgraph  object  This  can  be  proved  by  the  following  substitutions; 

A.fs/G.ES 

A.cslGJ^S 

city/X 

fiightlY 

source/A 

destination/B 

fare/C 

^CS  Jt  /G 


so  that 
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airline(A)  => 

instance_oflA.fsjet_of Jlight), 
instance_of[A.cs,set_of_city). 

airline(A),  attribute( flight  source, city), 
attribute(flight4estination,city),  attributeiflight  fare  float)  -> 
grapMAcjs) 

2.  The  methods  w _path  and  shortest  jtath  can  be  instantiated  according  to  the  above  instantiatitnis: 

Acsjjrw j}ath(P:set_of_flightfl:city,T:city,W:floatM:int):- 
memberjoflEAfs),  (E.source  =  S),  (E. destination  =  T), 

(P  =  lEJ),  (W  =  Efare),  (L  =  1). 

Acjjj.-w j)ath(P:set_cf_flightS:city,T:city,W:floatM:int) 
member _ofiE A  fs),  (Esource  =  S), 

A.-w _path(Pl  ,E.destination,T,V/l El),  (P  =  [ESPIJ), 

(W  =  Efare  +  Wl),  (L  =  U  +  1). 

Acsjs  -shortest j)ath(P:set_of  Jlight fl:city,T:city,W: float JE:int)  :• 

Acjs-yv j>ath(Pfl,T,WE),  (L  <=  LL) 

j>ath(PI,S,T,WlEl).  (LI  <=  LL),  (Wl  <  W)) 

3.  It  can  then  be  proved  that 
A:connection(CJS,TJFare)  =>Ac,j,.'w j>ath(PJS,T,W,l) 

with  the  following  set  of  substitutions  from  the  first  law  associated  with  A  .connection : 

S/S 

m 

F/E 

CIP 

Fare/W 

It  can  also  be  proved  that 

A:connection(CJS,TEare)  =>  Ac,j,:w jxith(Pfl.T,W,2) 

with  the  following  set  of  substitutions  from  the  second  law  associated  with  A  .connection : 

S/S 

T/T 

F/E 

IHJ/Pl 
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H.FarelWl 

Fare/W 

1/Ll 

2/L 

4.  Similariy,  it  can  be  proved  that 

_path(PS,T,W.l)  =>  A:connection(C^J/'are) 

with  the  following  set  of  substitutions  in  the  first  law  associated  with  A  '.connection : 

S/S 

T/T 

E/F 

P/C 

W/Fare 

and  prove  that 

Jxith(PSJ,W2)  ->  Axomection/CSJJFare) 

with  the  following  set  of  substitutitMis  in  the  second  law  associated  with  A  .connection : 

S/S 

T/T 

E/F 

Pl/lHI 

Wl/H.Fare 

W/Fare 

4.  Based  on  2  and  3.  we  can  conclude  that 

A:connection(C.S.TJFare)  <=>  A^js.-w _path(PSJ.W,l)  _path(PJSJ,W2) 

Subsequently,  the  following  can  be  concluded; 

A„js  shortest _path(PSJ.W2}  <=>  AxheapestJareiCSTf) 

This  is  because 

Axonnection(CSJfare)  <=>  A^^f.-w j>ath(PSJ.WJ,),  (L  <-  2) 


□ 


Example  2-8 
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As  another  example,  consider  the  following  two  versions  of  sort : 
Version  1: 

set_ofJnteger:method(sort3:set_ofJnteger) 


A:sort(B)  ;-  A:permutation(B),  B:sorted(). 

[]:sorted(). 

lHT]:sorted()  not  ( member _of(X,T).  (X  <  H)},  T:sorted(). 

Version  2  : 

set  of  Jlight:method(sortJB:set_of  Jlight) 
set _ofJnteger:method(  sorted  3  :setjofJnteger) 


A:sort(B)  ;-  A:permutation(B),  Bf^re^sorted (C). 

I  ]  .sorted  (). 

[H\rj .sorted ()  ;-  [ffT]:sorted_l<0). 

[H^}:sortedJ(N)  :•  (N  <=  H),  T:sorted_l(H}. 

It  can  be  proved  by  induction  that  [Htrj:sorted()  =>  lHT}:sortedJ(0)  and  therefore  [HTj.sortedO  => 
[HtT} : sorted  (0)  as  follows; 

1.  It  is  trivial  that  []:sorted()  ->  {]:sorted(0). 

2.  Assume  that  (HtT] :sorted()  =>  [fi(rj:sortedj(0)  <the  Hypothesis).  Now  the  following  can  be 
proved: 

-.sortedO  =>  not  (member_qf(X.[FiTjf  (X  <=  H’)),  lHT]:sorted(). 

By  the  hypothesis  and  the  above,  and  since  [ftr]:sorted_l(N)  =>  (N  <=  H),  T:sorted_l(H): 

[H’^Htr]:sorted( ) 

=>  (H  <=  H'),  {Hfri:5orted(). 

=>  (H  <=  //’),  [Htri:sorted_l(0). 

=>  (H  <=  H’).  (0  <  H),  T.sortedJiH). 

=>  (0  <  H),  (H  <-  H’},  T:sorted_l(H). 

=>  (0  <  If),  [IiT]:sorted_l(H’). 

=>  (0  <  If),  [HTJ -.sorted J(H’}. 

=>  :5ortedJ(0h 

Similarly,  we  can  prove  that  (MT] : sorted ()  =>  llfTJ :sorted(0)  and  conclude  that  [Htr]:sorted()  <=> 
{lKri:sorted(0). 


n 
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3.  OPTIMIZATION  OF  QUERY  PROGRAMS 

We  define  an  object-oriented  database  program  to  be  a  set  of  statements  (fiinction  calls  in  a 
LISP-flavored  programming  language)  whose  execution  is  sequenced  by  a  set  of  control  constructs. 
These  statements  in  general  operate  on  a  set  of  programming  objects  (i.e..  variables  and  constants)  and 
database  objects  which  are  classified  into  different  types.  The  major  difference  between  a  program¬ 
ming  object  and  a  database  object  is  that  the  latter  is  persistent.  However,  the  contents  of  a  program¬ 
ming  object  can  be  assigned  to  a  (compatible)  database  object  and  vice  versa.  From  performance  point 
of  view,  we  feel  that  a  major  difference  between  a  database  programming  system  and  an  ordinary  pro¬ 
gramming  system  should  be  that  a  database  program  needs  to  be  evaluated  by  the  database  program¬ 
ming  system  but  not  by  some  extension  of  an  ordinary  compiler  due  to  the  large  volume  of  data 
involved.  On  the  other  haiKl,  it  would  be  insqjpropriate  to  loosely  couple  an  ordinary  compiler  (with  a 
pre-processor)  and  a  database  query  optimizer  due  to  the  communication  overhead  and  the  lack  of  glo¬ 
bal  considerations. 

This  section  studies  the  approach  to  globally  optimizing  the  evaluation  of  database  programs 
within  a  prototyped  object-oriented  database  programming  environment  OASIS  (an  Object-oriented 
And  Symbolic  Information  System),  which  is  a  database  system  intended  to  extend  the  conventional 
UNIX  programming  envirorunent  with  persistent  customized  objects,  object-oriented  database  pro¬ 
gramming,  and  symbolic  information  management  The  OASIS  query  languages  extend  conventional 
database  query  languages  with  procedural  methods  and  general  control  statements.  As  the  complexity 
of  the  languages  makes  it  difficult,  if  rwt  impossible,  to  devise  a  “query  optimizer”  based  on  a 
universally  iq)plicable  algorithm,  the  OASIS  query  interpreter  optimizes  the  performance  of  OASIS 
programs  based  on  a  collection  of  “basic  patterns”  for  which  each  pattern  is  associated  with  a 
separate  query  optimization  algorithm,  (jonsequently,  an  OASIS  program  can  be  divided  into  a  set  of 
segments  and  each  segment  is  qrtimized  separately. 

In  this  section,  we  describe  the  optimization  techniques  for  a  set  of  basic  patterns  consisting  of 
iterative  statements  arid  a  set  of  nested  statements.  Such  statements  occur  most  frequently  in  query 
programs  and  are  different  from  traditional  nested  queries  (which  are  mainly  used  for  the  purpose  of 
aggregation)  in  nature.  The  optimization  techniques  discussed  in  the  following  include  an  extended 
decomposition  algorithm,  evaluation  of  multiple  conditions,  data  dependence  analysis,  and  optimiza¬ 
tion  of  queries  with  arbitrary  nesting.  The  conventional  query  decomposition  algorithm  [WoYo76]  is 
extended  to  incorporate  the  evaluation  of  procedural  methods.  When  a  series  of  conditional  state¬ 
ments  is  included  in  a  nested  loop,  these  statements  can  be  transformed  into  independent  statements  so 
that  common  subexpressions  can  be  shared  to  reduce  the  evaluation  cost.  When  update  operations  are 
included  in  nested  statements,  data  dependencies  among  statements  are  taken  into  account  for  proper 
optimization.  FinaUy,  for  a  general  query  which  is  an  arbitrary  combination  of  basic  patterns,  a  global 
optimization  strategy  is  disc  ^6. 

3.1  PREVIOUS  WORK 

In  the  past,  several  database  programming  languages  have  been  proposed  and/or  implemented 
[AtBu87].  Some  database  query  languages  can  be  embedded  in  a  host  programming  (e.g.,  SQL  in 
PL/1  [Astr761  and  QUEL  in  C  [SWKH76]).  To  our  knowledge,  most  of  the  above  systems  have  been 
implemented  with  an  extended  language  compiler  and  a  separate  database  system  so  that  accesses  to 
persistent  objects/data  are  optimized  by  the  database  system  at  the  query  level.  Little  consideration  has 
been  given  to  global  program  optimization  as  an  ordinary  compiler  does. 
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Along  another  direction,  extensive  research  has  been  reported  on  the  subject  of  multiple-query 
optimization  and  evaluation  of  nested  queries  for  relational  databases.  In  general,  multiple-query 
optimization  procedures  consist  of  two  parts  [PaSe88):  identifying  common  sub-expressions  and  con¬ 
structing  a  global  access  plan.  Although  detection  of  common  sub-expressions  or  applicability  of 
access  paths  may  be  computationally  intractable  or  even  undecidable  if  a  set  of  arbitrary  sub¬ 
expressions  is  considered  [Jark84],  various  approaches  have  been  proposed  (JaKo84]  [GrMi81] 
[Jaik841  [ChMi821  [ChMi86].  Given  the  infoimation  of  sharing,  several  search  heuristic  algorithms 
have  been  discus:>cd  [GrMi80]  [Sell88]  [PaSe88]  [PaTL89].  On  the  other  hand,  optimization  of  nested 
queries  in  a  relational  database  such  as  SQL  has  been  discussed  extensively  in  [Kim82]  [Kim84]  and 
[GaWo87].  for  which  the  major  concern  for  nested  queries  has  been  the  treatment  of  aggregation  func¬ 
tions. 

Multiple  queries  and  nested  transactions,  in  general,  can  be  regarded  as  special  cases  of  database 
programs.  Consequently,  the  techniques  developed  in  these  two  areas  can  be  ^plied  to  optimize 
qualified  segments  in  a  database  program  as  we  will  describe  later. 

32  OVERVIEW  OF  AN  OBJECT-ORIENTED  AND  SYMBOLIC  INFORMATION  SYSTEM 

In  this  section,  we  briefly  review  the  essence  of  OASIS  and  introduce  the  schema  definition  for 
a  small  database  as  an  example. 

32.1  THE  ARCHITECTURE  OF  OASIS 

The  overall  architecture  of  an  OASIS  environment  is  shown  in  Figure  3.1,  which  consists  of  a 
database,  a  knowledge  base,  a  meta-knowledge  base  and  an  OASIS  database  programming  language 
interpreter: 


USER 


Figure  3.1.  OASIS  Architecture. 
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(1)  The  database  contains  a  set  of  methods  and  a  set  of  persistent  objects  that  are  organized  into 
classes. 

(2)  On  top  of  the  objects  and  methods,  there  are  a  set  of  integrity  constraints  and  a  set  of  view 
definitions,  stored  in  a  textual  form. 

(3)  An  OASIS  database  program  interacts  with  the  OASIS  environment  with  the  OASIS  interpreter. 
A  user  interacts  with  the  OASIS  system  with  functions  written  in  OASIS-LISP  (which  is  an 
extension  of  LISP)  or  OASIS-C  (which  is  an  interpretable  version  of  C):  both  types  of  functions 
can  be  used  in  an  interleaved  way  to  accomplish  a  complex  user  query. 

In  OASIS,  five  classes  have  been  predefined:  integer,  float,  string,  symbol'  and  list,  where  the 
first  four  are  referred  to  as  primitive  classes.  A  list  object  is  a  composite  object  with  three  attributes: 
name ,  car  and  cdr ,  where  name  stands  for  the  name  of  the  list,  car  refers  to  the  first  element  of  the 
list,  and  cdr  refers  to  a  list  that  is  composed  of  the  remaining  elements  in  the  list.  A  summary  of  the 
database  definition  language  of  OASIS  is  given  in  Appendix  A. 


1.  3.2J  A  SMALL  DATABASE  AS  AN  EXAMPLE 

In  this  section,  we  present  a  schema  definition  which  describes  a  small  world  including  various 
obstacles  and  moving  cars.  The  meaning  of  each  class  and  attribute  is  self-explanatory.  Note  that  a 
class  (recursively)  inherits  the  attributes  and  the  methods  of  its  superclass  unless  the  class  overrides 
them.  A  class  can  redefine  a  method  with  the  same  name  as  in  its  superclass  and  rename  an  attribute 
in  its  superclass  (e.g.,  the  notation  TID/OID  can  be  used  to  change  an  attribute  name  OID  into  TID). 
For  example,  each  entity  in  the  class  triangle  has  TID,  SIZE,  COLOR,  and  its  three  nodes  as  its  attri¬ 
butes.  The  class  hierarchy  of  this  database  is  as  follows: 

<classes> 

(def class  car_world  (CWID  int)  (key  CWID)) 

<subclasses> 

(def subclass  obstacles  car_world  (OID/CWID  int)  (SIZE  int)  (COLOR  string)  (key  OID)) 

<subclasses> 

(defsubclass  triangle  obstacles  (TID/OID  int)  (NODEl  position)  (NODE2  position) 
(NODES  position)  (key  TID)) 

(defsubclass  rectangle  obstacles  (RID/OID  int)  (NODEl  position)  (NODE2  position) 

(key  RID)) 

(defsubclass  circle  obstacles  (CID/OID  int)  (CENTER  position)  (RADIUS  int) 

(key  CID)) 

(defsubclass  operator  car_world  (EMP_ID/CWID  int)  (DEPT  string)  (CAN_DRIVE  car) 

(key  EMPJD)) 

(defsubclass  car  car_world  (CAR_ID/CWID  int)  (YEAR  int)  (MODEL  string) 

(PERMIT  inr)  (key  CARJD)) 


A  symbol  is  a  sequence  of  atpha-numeric  characters. 


(defsubclass  station  carjvorld  TSID/CWID  int)  ^PLACE  position)  PERMIT  int) 

(key  SID)) 

<methods> 

;  move  an  object  'a '  from  ‘b  ’  to  ‘c  ’ 

(d^ethod  (move  int)  (a  carjvorld)  (b  position)  (c  position)  ( ...  )) 

;  rotate  an  object  'a  ’by  'd'  degree 
(definethod  (rotate  int)  (a  carjvorld)  (d  int)  (  ... )) 

;  return  the  distance  between  two  objects  ‘a  ’and  ‘b' 

(definethod  (distance  int)  (a  carjvorld)  (b  carjvorld)  ( ...  )) 

;  return  the  length  of  the  shortest  path  between  two  objects  ‘a  ’  and  ‘b  ’ 
(definethod  (shortest _path  int)  (a  carjvorld)  (b  carjvorld)  ( ... )) 

;  Given  two  geometric  object  a  and  b ,  check  if  they  are  intersected  or  not. 

;  If  intersected,  return  the  size  of  the  intersected  area;  otherwise  remm  0. 
(d^ethod  (intersect  int)  (a  carjvorld)  (b  carjvorld)  (... )) 


3JJ  STORAGE  STRUCTURES 

In  OASIS,  objects  in  a  class  are  stored  in  the  form  of  a  relation,  where  each  tuple  corresponds 
to  an  object  instance  in  the  class.  Considering  this,  in  the  remaining  of  this  section,  we  use  the  term 
“relation”  and  the  term  “class”  interchangeably.  An  attribute  value  in  OASIS  can  be  a  nested  object 
and  retrieval  of  each  nested  objea  is  done  directly  through  its  key  identifier  (whose  value  is  stored  as 
the  value  of  the  attribute).  For  simplicity,  we  assume  a  uniform  cost  for  referencing  an  attribute 
value. 


3J.  A  DATABASE  PROGRAMMING  LANGUAGE  AND  BASIC  PATTERNS 

As  described  earlier,  an  OASIS  database  program  interacts  with  the  OASIS  environment  with 
the  OASIS  interpreter.  A  user  interacts  with  the  OASIS  system  with  functions  written  in  OASIS-LISP 
(which  is  an  extension  of  LISP)  or  OASIS-C  (which  is  an  interpretable  version  of  C);  both  types  of 
functions  can  be  used  in  an  interleaved  way  to  accomplish  a  complex  user  query.  In  the  rest  of  this 
section,  we  shall  concentrate  on  OASIS_LISP.  The  term  “query”  will  be  used  interchangeably  with 
the  term  “query  program". 

Two  outstanding  features  of  OASIS-LISP  which  may  affect  the  evaluation  significantly  are  as 
follows: 

a)  Procedural  methods  are  allowed  in  queries. 

b)  Control  structures  are  allowed  in  a  program. 

In  the  following,  we  briefly  explain  these  two  features: 
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METHODS  IN  QUERIES 

Methods  are  customized  procedures  associated  with  object  classes.  Including  methods  usually 
saves  additional  programming  effort  by  calling  methods  within  a  query  program.  For  instance,  the 
following  query  retrieves  a  list  of  triangles  which  intersect  with  a  rectangle  and  the  areas  of  the  two 
objects  are  the  same. 

(forall  V ,  in  triangle 

(forall  V2  in  rectangle 

(cond  (and  (ieq  v^AREA  V2AREA) 

(intersect  V I  vjlJ 
( retrieve  v  j  .TID)))) 

Without  the  method  intersect  in  the  above  example,  we  may  need  an  additional  program  segment 
checking  the  intersection  of  two  rectangles. 

CONTROL  STRUCTURES  Op8 

In  OASIS-LISP,  various  control  structures  such  as  conditional  statements  and  iteration  loops  are 
included  to  enhance  the  scope  of  traditional  query  languages;  these  include  while,  do  mA  forall,  where 
a  while  or  do  statement  iterates  until  a  condition  fails  or  the  induction  variable  reaches  a  preset  limit 
and  a  forall  statement  iterates  for  each  instance  of  a  given  set.  One  example  usage  of  iterations  is  to 
realize  a  transitive  closure.  One  can  find  all  the  connections  between  two  nodes  using  the  transitive 
closure  of  connections. 

In  OASIS-LISP,  most  interesting  relational  <^or  set-oriented)  operations  can  be  programmed  using 
forall  and  cond  statements.  Consequently,  the  basic  patterns  of  statements  can  be  classified  accord¬ 
ing  to  the  / orall  and  cond  statements  in  a  query. 

3J.1  CANONICAL  FORALL-COND  STATEMENTS 

A  canonical  query  consists  of  a  set  of  successively  nested  forall  statements  and  a  cond  state¬ 
ment  in  the  inner  most  loop.  It  is  in  the  following  form: 

(forall  ...  in  Ri 

(forall  ...  in  R^ 

(cond  (F  action))...) 

When  the  innermost  action  is  a  retrieve  statement,  this  is  equivalent  to  a  relational  query  as  follows 
(written  in  QUEL); 

RANGE  OF  V,  IS/?i 

RANGE  OF  V,  IS  Rt 
RETRIEVE 

WHERE  condition 

Because  only  arithmetic  comparisons  and  aggregation  methods  are  supported  in  a  relational  data¬ 
base,  including  procedural  methods  causes  some  problems  to  traditional  relational  query  optimization 
strategies.  In  optimizing  a  query  program,  a  canonical  query  can  be  considered  as  a  basic  pattern. 
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When  a  query  program  is  processed,  it  should  be  transformed  into  a  canonical  query  if  possible.  For 
example,  consider  the  following  query  which  includes  a  cond  statement  in  the  middle  of  a  set  of  suc¬ 
cessively  nested  forall  statements. 

(forall  V 1  in  R I 

(forall  V2  in  R  2 
(cond  F\ 

(forall  Vj  in  R  2 

(cond  (Fi  action)))))) 

Applying  the  commutative  law  between  selections  and  cross  products,  the  above  query  can  be 
transformed  into  a  canonical  query  as  follows: 

(forall  V 1  in  R\ 

(forall  V2  i/i  /?2 

(forall  V3  in  R^ 

(cond  ((and  Fj  F2)  action))))) 

332  NESTED  STATEMENTS  IN  SUCCESSIVE  FORALL  STATEMENTS 

Various  statements  can  be  nested  in  one  or  mote  successively  nested  forall  statements.  _  The 
basic  patterns  of  nested  queries  can  be  classified  as  follows: 

(a)  A  GENERAL  COND  STATEMENT  (TYPE-GENERAL_COND) 

If  we  extend  a  canonical  query  with  a  general  cond  statement,  we  can  obtain  a  query  of  the 

form; 

(forall  ...  in  Ri 

(forall  ...  in  Ric 
(cond  (F I  action  x) 

(F„  action„)...) 

where  the  generalized  cond  statement  should  be  interpreted  as: 

(IF  F I  THEN  DO  action  i) 

(ELSE  IF  F2  THEN  DO  acfio«2) 

(ELSE  IF  F„  THEN  DO  action„ ) 

(b)  MULTIPLE  COND  STATEMENTS  (TYPE.MULTIPLE_COND) 

If  multiple  cond  statements  are  iiKluded  inside  of  a  set  of  successively  nested  forall  statements, 
we  would  obtain  a  query  of  the  form: 

(forall  ...  in  Fj 
(forall  ...  in  /?* 
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(cond  (F  X  action  x)) 

(cond  (F„  action„)...) 

Semantically,  the  cond  statements  in  the  above  query  should  be  processed  sequentially  for  each 
instance  of  variable  bindings  (i.e..  each  tuple  in  the  cross  products  of  relations  . /?*). 

(c)  NESTED  FORALL  STATEMENTS  (TYPE-NESTED  FORALL) 

When  a  forall  statement  is  present  with  other  statements  (e.g.,  other  forall  or  cond  statements) 
at  the  same  level  of  nesting,  it  cannot  be  included  in  the  outer  forall  statements.  For  exam^de,  con¬ 
sider  the  following  query 

(forall  ...  in  Rx 

(forall  ...  in  R^ 

(forall  ...  in  R 
(retrieve  ...)) 

(cond  (F  action)...) 

In  this  example,  (forall  ...  in  R  (retrieve  ,..))  is  an  individual  forall  statement  rather  than  a  part  of  a 
successively  nested  forall  statements.  Typical  rested  queries  in  a  relational  language  can  be  con¬ 
sidered  as  instances  of  this  type. 

(d)  ASSORTED  STATEMENTS  (TYPE-ASSORTED) 

In  general,  an  arbitrary  combination  of  the  statements  available  in  OASIS-LISP  can  be  nested. 
Besides  forall,  cond,  and  retrieve,  a  statement  in  OASIS-LISP  could  be  a  method  which  may  be  a 
data  manipulation  statement  such  as  append,  delete  and  replace.  In  this  situation,  optimization  of  a 
nested  statement  may  be  affected  by  the  presence  of  other  statements.  In  particular,  when  data  mani¬ 
pulation  statements  are  present  along  with  other  statements  (e.g.,  cond  and  nested  forall)  inside  of  a 
set  of  successively  nested  forall  statements,  data  dependences  should  be  analyzed  for  proper  optimi¬ 
zation.  As  an  example,  consider  the  following  query: 

(forall  ...  in  Rx 

(forall  ...  in 
(cond  (Fx  action  x)) 

(replace  Rj  A  (plus  Ri  A  10)) 

(cond  (F  2  action  2)))) 

Because  the  attribute  value  /?,.A,  where  1  ^  i  ^  il,  are  updated  after  the  first  cond  statement  in  each 
iteration,  F2  should  be  evaluated  according  to  the  updated  values. 

In  fact,  a  canonical  query  or  a  nested  statement  of  type  GENERAL_COND,  MULTIPLE_COND 
or  NESTED_FORALL  is  a  special  case  of  a  type-ASSORTED  statement.  Given  a  type-ASSORTED 
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statement,  those  special  cases  should  be  considered  first. 

333  GENERAL  MULTI-LEVEL  STATEMENTS  (TYPE-GENERAL) 

We  define  a  level  of  nesting  to  be  set  of  a  successively  nested  forall  statements  and  the  body  of 
the  innermost  forall  statement,  where  the  body  of  the  innermost  forall  statement  can  include  another 
level  of  nesting  recursively.  Generally,  a  nested  query  may  consist  of  multiple  levels  of  nesting, 
where  each  level  of  nesting  would  be  one  of  the  basic  patterns  defined  above  (i.e,  caiwnical  queries 
and  nested  statements  of  types  GENERAL_COND.  MULTIPLE  COND.  NESTED  FORALL.  and 
ASSORTED). 

3.4.  PROCESSING  A  CANONICAL  QUERY 

In  the  previous  section,  we  showed  a  canonical  query  is  semantically  equivalent  to  a  relational 
query.  However,  including  procedural  methods  introduces  problems  to  conventional  relatiotud  query 
optimization  techniques.  For  a  large  database,  an  optimal  nested-loop  algorithm  could  be  inefficient 
when  the  number  of  variables  involved  in  a  query  is  large.  Realizing  this,  a  non-linear  search 
approach  based  on  query  decomposition  is  taken  in  OASIS.  The  definitions  of  the  query  decomposi¬ 
tion  algorithm  [WoYo76]  and  connection  gra(^  are  summarized  in  Appendix  C.  The  main  idea  of 
the  decomposition  algorithm  can  be  summarized  as  follows: 

(a)  Perfonn  lower-cost  operations  first,  i.e..  in  the  order  of  selection,  equi-join,  general-joins  and 
Cartesian  product. 

(b)  Keep  the  temporary  relations  small  by  selecting  small  relations  first  and  disconnecting  the  gr^h 
if  possible. 

In  order  to  evaluate  a  canonical  query  written  in  OASIS-LISP,  the  origiruil  query  decomposition 
algorithm  should  be  modified  because  procedural  methods  were  not  considered.  While  conditions  in 
the  original  query  decomposition  algorithm  can  be  considered  as  logical  methods  in  OASIS  (see 
Appendix  A),  a  conjunct  in  OASIS-LISP  can  be  a  method  of  any  type.  In  the  following,  we  discuss 
how  to  decrease  the  number  of  method  calls  when  the  query  decomposition  algorithm  is  followed. 

In  a  connection  graph,  a  procedural  method  can  be  denoted  by  a  rectangular  node,  where  its 
input  arguments  and  output  arguments  are  represented  by  input  arcs  and  output  arcs,  respectively.  As 
an  example,  an  extended  connection  grafdi  for  the  following  query  is  shown  in  Figure  3.2. 

(forall  V 1  in  car 
(forall  V2  in  station 
(forall  V3  in  operator 
(forall  V4  in  manager 

(cond  (and  (ieq  v^J^ERMlT  viJ^ERMlT) 

(and  (seq  v^MODEL  vyCAN_DRIVE) 

(and  (seq  v^iDEPT  v^DEPT) 

(and  (shortest _path  Vj  V2  PATH) 

(He  PATH  100))))) 

(retrieve  all)))))) 

In  OASIS,  procedural  methods  are  procedures  which  can  include  either  simple  mappings  or  very 
expensive  computations  (e.g.,  matrix  computations).  If  a  procedural  method  has  only  one  input 
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Figuie  3.2.  An  extended  connection  graph. 


argument  and  it  is  an  attribute  of  a  relation  (we  define  this  relation  to  be  an  input  relation  to  the 
method),  the  method  will  be  executed  for  each  tuple  in  the  relation.  In  this  case,  evaluation  of  the 
method  does  not  change  die  number  of  tuples  in  the  input  relation.  It  attaches  the  values  of  output 
arguments  to  each  tuple  in  the  input  relation.  If  the  input  values  of  a  procedural  method  come  from 
more  than  one  relation  and  the  relations  are  not  connected  by  joins,  the  method  should  be  evaluated 
for  all  possible  combinations  (of  instantiations)  for  its  input  variables  (i.e.,  all  the  tufdes  from  the 
Cartesian  product  of  the  input  relations). 

Before  a  procedural  method  is  evaluated,  all  the  input  relations  of  the  method  should  have  been 
instantiated  or  dissected  because  the  values  of  input  arguments  are  needed  for  evaluation.  In  the 
decomposition  algorithm,  evaluation  of  procedural  methods  should  be  included  among  dissections 
because  instantiations  can  always  be  done  without  increasing  the  number  of  tuples  in  relations.  Dur¬ 
ing  dissections,  we  can  reduce  the  number  of  method  calls  by  considering  the  effects  of  a  dissection 
on  an  input  node  (i.e.,  a  node  which  represents  an  input  relation)  or  a  node  which  is  connected  to  an 
input  node  of  the  method. 

When  two  relation  tKXles  /t  and  m  are  connecttd  by  a  join  edge,  we  define  the  reduction  factor 
r”  associated  with  the  edge  to  be 
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_m  _  \m  join  n  I 
"■  i»i 

Similarly,  the  reduction  factor  can  be  defined  as 

„  _  \m  Join  n  I 
Iml 

If  an  input  node  in  for  a  procedural  method  is  a  candidate  for  the  next  dissection,  we  can  compute  the 
reduction  factors  for  all  the  nodes  con  j,  •  •  •  ,  co/i*  which  are  connected  to  in .  If  the  value  of  any 

where  Isr  is  less  than  one,  dissecting  conj  can  reduce  the  number  of  method  calls.  Simi¬ 
larly,  if  a  candidate  node  cand  for  the  next  dissection  is  connected  to  an  input  node  in  for  a  method 
to  be  evaluated,  we  compute  the  reduction  factor  If  is  greater  than  one,  the  input  node  in 
should  be  dissected  first  to  decrease  the  number  of  method  calls.  As  an  example,  consider  the  con¬ 
nection  graph  shown  in  Figure  3.3,  assuming  tte  cardinalities  of  the  involved  relations  are  given  as 
follows: 

kriangl^^  100 
Vectangld=  100 
kriangle  Join  rectangld  =  500 


Figure  3.3.  A  connection  graph  irKluding  a  procedural  method. 


According  to  the  decomposition  algorithm,  ignoring  the  method  rotate,  either  rec  or  tri  can  be 
seleaed  for  dissection.  Assuming  rec  is  selected,  because  the  node  rec  is  directly  connected  to  the 
input  iKxle  tri  of  the  method  rotate ,  we  compute  the  reduction  factor  r^  before  rec  is  dissected: 

rtc  _  \triangle  Join  rectanglel  _  _ 

\triangle\ 

This  implies  that  if  the  join  is  performed,  the  method  rotate  should  be  evaluated  for  each  tuple  in  the 
result  of  the  join,  and  the  cardinality  of  which  is  five  times  as  large  as  that  of  relation  triangle.  Con¬ 
sidering  this,  the  node  tri  should  be  dissected  first,  and,  in  this  case,  the  number  of  method  calls  is 
100. 
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If  a  method  has  more  than  one  input  relation,  then  any  connection  between  the  input  relations 
should  be  considered  for  the  method  evaluation.  When  two  input  relations  for  a  method  are  connected 
directly  by  an  edge,  evaluation  of  the  edge  may  decrease  the  number  of  method  calls  because  the 
method  will  be  evaluated  for  only  those  pairs  of  tuples  which  satisfy  the  condition  on  the  edge.  This 
is  a  slight  modihcation  to  the  dissection  operation  because  the  method  calls  will  be  deferred  until  after 
the  edge  is  evaluated.  When  two  input  relations  for  a  method  are  connected  indirectly  (i.e.,  they  are 
connected  through  some  other  nodes),  we  can  compare  the  cardinalities  of  the  resulting  relation  pro¬ 
duced  by  the  coimections  and  the  Cartesian  product  of  the  two  input  relations  to  choose  a  smaller  one. 
If  the  former  is  chosen,  evaluation  of  the  method  should  be  deferred  until  all  the  involved  edges  are 
evaluated. 

Considering  the  above,  the  modified  query  decomposition  algorithm  can  be  summarized  as  fol¬ 
lows: 

ALGORITHM  3.1.  Modified  Query  Decomposition  Algorithm 
INPUT:  A  cormection  graph  for  a  canonical  query. 

OUTPUT:  An  access  plan. 

1)  Do  all  instantiations.  Here  a  method  is  instantiated  (executed)  if  all  of  its  input  arguments  have 
been  instantiated. 

2)  Select  a  relation  node  n  for  dissection  based  on  the  original  query  decomposition  algorithm. 

2.1)  If  n  is  an  input  relation  for  a  method  to  be  evaluated,  compute  the  reduction  factors  with 

con 

respect  to  all  the  nodes  co/ii,  •  •  •  ,  cont  connected  to  n.  If  any  reduction  factor 

con 

where  1  Sr  Sit,  is  less  than  one,  select  the  node  conj  which  produces  the  mimmal  r„  '  for 
dissection. 

3.2)  If  the  node  n  is  connected  to  an  input  node  in  of  a  method  to  be  evaluated,  compute  the 
reduction  factor  r”„.  If  r^"  is  greater  than  one,  select  the  node  in  for  the  next  dissection. 

3)  Perform  dissection  on  the  selected  node  n'.  If  at  this  point  all  the  input  ixxles  to  a  method  are 
only  connected  through  the  method,  evaluate  the  method  for  all  the  tuples  in  the  Cartesian  pro¬ 
duct  of  the  input  nodes. 

4)  Repeat  steps  1-3  until  no  edge  remains. 

6)  Return  the  Cartesian  product  of  the  relations  saved  so  far. 

□ 

Example  3-1.  Consider  the  connection  graph  drown  in  Figure  3.2.  According  to  the  original 
decomposition  algorithm,  cither  operator  or  car  can  be  selected  for  dissection.  Assume  is 

less  than  one,  where 

operator  _  Icor  join  Operator  \ 

■  Icarl 

According  to  step  2.1)  of  Algorithm  3.1,  we  dissect  operator  first.  After  the  dissection,  the  relations 
car  and  station  can  be  irrstantiated.  Subsequently,  the  join  between  operator  and  manager  is  com¬ 
puted.  Because  the  method  shortest _path  has  two  input  nodes  arrd  they  are  connected  directly,  it 
should  be  evaluated  for  each  tuple  in  the  result  of  the  join  between  the  two  relations  car  and  station . 
Finally,  all  the  car -station  pairs  with  the  shortest  distance  less  than  or  equal  to  IS  are  computed,  and 
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the  Cartesian  product  of  such  pairs  and  all  the  operator  -manager  pairs  computed  eadier  is  returned  as 
the  result. 

n 

The  modifications  introduced  in  Algorithm  3.1  may  change  the  basic  order  of  dissections  in  the 
original  decomposition  algorithm  only  when  the  change  can  reduce  the  number  of  method  calls.  In 
addition,  when  a  txxle  to  be  dissected  is  one  of  the  input  nodes  of  a  method,  evaluation  of  the  method 
is  deferred  until  all  the  input  arguments  are  instantiated. 

3,5  PROCESSING  A  TYPE-GENERAL_COND  OR  TYPE-MULTIPLE_COND  QUERY 

A  type-GENERAL_COND  or  type-MULTIPLE_COND  query  may  contain  multiple  conditional 
clauses  (i.e..  conditions  and  associated  actions).  Basically  these  conditional  clauses  can  be  evaluated 
sequentially  by  nested  iterations.  However,  if  the  order  of  evaluation  is  not  relevant  to  the  meaning  of 
the  query  (e.g.,  read-only  queries  in  general),  for  each  conditional  clause,  we  can  derive  a  relation 
whose  tuples  satisfy  the  condition  using  relational  operations.  Subsequently,  evaluation  of  multiple 
conditions  can  be  optimized  by  considering  comment  subexpressions  [Kim82]  [Kim84].  A  heuristic 
approach  to  optimize  multiple  conditions  will  be  presented  later  by  modifying  the  decomposition  algo¬ 
rithm. 

3,5.1  MULTIPLE  CONDITIONS 

We  recite  the  general  syntax  of  type-GENERAL_COND  queries  as  follows: 

(forall  ...  in  R•^ 

(forall  ...  in  R* 

(cond  (F I  action  i) 

(F„  action^)...) 

The  simplest  approach  to  processing  a  query  of  the  above  form  would  be  to  take  a  Cartesian  product 
of  all  the  relations  involved  in  the  forall  statements,  then  test  each  condition  in  turn.  Assuming  that 
|R,  I  =  Hi  (for  1  <  /:).  the  cost  of  evaluating  such  a  query  would  be  of  the  order  0(/ii  x  ...  x  n^).  How¬ 
ever.  if  we  collect  only  those  tuples  which  will  be  used  for  the  actions  in  the  conditional  clauses,  per¬ 
forming  the  Cartesian  product  of  all  the  involved  relations  can  be  avoided. 

As  mentioned  earlier,  the  cond  statement  in  the  above  is  semantically  equivalent  to  an  IF- 

THEN-ELSE  statement.  Among  the  tuples  in  the  Cartesian  product  of  R| . only  those  which 

satisfy  at  least  one  of  the  conditions  will  be  executed.  In  terms  of  relational  algebra,  one  of  the 
actions  is  executed  for  the  following  candidate  relation: 


^cand  *  ^  ^  I  ^ ...  ^ 


where  F  =  F  F„.  The  disjunction  of  conditions  can  be  expanded  using  unions,  i.e., 
=  Of,(Rlx...xR*  )  U  ...  VJ  Of^(Rix...xR*  ) 
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Now  the  cost  of  evaluating  the  subqueries  Op  (Rix,„xR^),  1  £  /  ^  /t,  can  be  decreased  by  sharing 
common  subexpressions. 

Similariy,  consider  a  type-MULTIPLE_COND  query  in  the  following  form: 

(forall  ...  in  Ri 

(forall  ...  in  R^ 

(cond  (F I  action  I )) 

(cond  (F„  action^)...) 

In  this  case,  a  candidate  relation  for  each  cond  ^tement  can  be  computed.  The  candidate  relation 
Rcand,  fix'  octioni ,  1  ^  i  £  n,  contains  only  those  tuples  which  will  be  actually  used  for'  the  execution 
of  actionj ,  1  <  i  S  n : 

^cand,  =  Of,  (  ^  I  X  ...  X  ^*  ) 

The  evaluation  cost  can  be  reduced  by  considering  the  sharing  of  common  expressions  among  Rcamt^, 
I  ^  i  ^  n.  In  the  next  subsection.  Algorithm  3.1  is  further  modified  to  evaluate  multiple  conditions. 

3JJ  PROCESSING  MULTIPLE  CONDITIONS 

As  discussed  in  [ChMi86].  a  connection  graph  can  be  extended  as  follows  to  represent  multiple 
conditions.  First,  one  candidate  relation  should  be  returned  as  the  answer  for  each  condition.  Second, 
the  priority  in  selecting  the  nodes  should  be  changed  because  some  instantiations  may  prevent  later 
sharings. 

Now.  given  a  node  for  each  relation  Rj,  I  ^  i  ^  n,  »  conjunct  in  each  condition  can  be  added  as 
an  edge.  The  label  of  each  edge  is  added  with  the  a)ndition  number  from  which  it  comes.  For  exam¬ 
ple.  considering  the  following  query,  its  connection  graph  is  shown  in  Figure  3.4. 

C 1  ;  (and  (He  tri.SVZE  JO) 

(and  (seq  tri. COLOR  rec .COLOR))) 

C  2  ;  (and  (seq  tri  .COLOR  rec  .COLOR) 

(igt  (distance  rec  cir)  50)) 

According  to  (ChMi86}.  in  processing  an  extended  connection  gra]^.  each  condition  is  evaluated 
separately  unless  some  sharing  is  possible.  The  basic  two  operations  (i.e..  instantiation  and  dissection) 
were  modified  as  follows: 

(a)  After  an  instantiation  has  been  performed  on  a  relation,  the  edge,  the  node  comesponding  to  the 
constant  are  deleted,  and  the  relation  node  togedier  are  turned  into  a  small  node.  If  two  condi¬ 
tions  are  identical,  only  one  small  node  is  created. 

(b)  When  the  join  conditions  are  different  between  two  ixxles.  one  dissection  wiU  be  done  for  each 
condition  and  a  separate  set  of  constant  nodes  is  created.  For  two  conditions  that  are  identical 
only  one  set  of  constant  nodes  is  produced. 

When  the  above  two  types  of  basic  operations  are  executed,  the  execution  cost  can  be  reduced 
by  sharing  common  subexpressions  among  conditions.  However,  in  order  to  achieve  the  maximum 
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Figure  3.4.  A  connection  graph  for  multiple  conditions. 


saving,  a  complete  search  with  a  combinatorial  complexity  is  necessary  [Jark84].  In  most  cases,  a 
heuristic  approach  with  a  relatively  small  search  space  would  be  viable.  In  evaluating  multiple  condi¬ 
tions  heuristically,  the  most  important  thing  is  to  defer  the  instantiations  properly.  If  two  conditions 
share  a  join  and  one  of  these  two  includes  a  selection  on  one  of  the  relations  (see  Figure  3.S  (a)  for 
an  example),  obviously  the  common  Join  should  be  executed  first.  If  two  conditions  share  a  join  and 
each  of  them  has  a  different  selection  on  one  of  the  relations  (see  Figure  3.5  (b)  for  an  example),  the 
two  relations  have  to  be  evaluated  separately.  However,  as  shown  in  Figure  3.5  (c).  two  selection  con 
ditions  could  be  comparable  (i.e.,  one  condition  subsumes  the  other).  In  other  words,  the  result  of  one 
selection  (rn  .SIZE  <  1(X))  is  a  super  set  of  the  result  of  the  other  selection  (rr/.SIZE  <10).  In  this 
situation,  the  more  general  selection  (m'.SIZE  <  100)  can  be  executed  first  followed  by  the  join  opera¬ 
tion.  The  other  selecticm  will  be  executed  based  on  the  result  of  the  join. 

When  methods  are  considered,  the  order  of  evaluation  with  sharing  of  subexpressions  as 
described  above  may  conflict  the  procedure  described  in  the  previous  section.  In  other  words,  evaluat¬ 
ing  common  subexpressions  may  increase  the  number  of  method  calls.  To  avoid  this,  we  can  dissea 
a  input  argument  of  a  method  if  the  cardinality  of  the  input  node  is  known  to  be  increased  (by  com¬ 
puting  the  reduction  factors)  after  including  it  in  a  common  sub-expression.  Considering  the  above,  a 
modified  decomposition  algorithm  that  considers  the  evaluation  of  multif^e  conditions  and  methods 
can  be  summarized  as  follows: 

ALGORITHM  3J.  Query  Decomposition  Algorithm  for  Multiple  Conditions 
INPUT:  A  connection  graph  for  multiple  conditions. 
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Figure  3.S.  Examples  of  ordering  common  expressions. 

OUTPUT:  An  access  plaa 

1)  Do  instantiations  on  relation  nodes  which  are  not  incident  upon  any  common  edge.  Here  a 
method  is  instantiated  only  if  all  of  its  input  relations  have  been  instantiated. 

2)  Select  a  node  n  for  dissection  in  the  following  order  of  preference; 

1.1)  Select  a  node  which  is  incident  on  a  common  join  edge  between  two  relation  nodes  for 
dissection.  If  the  node  is  connected  to  an  input  node  in  of  a  method  and  the  reduction  fac¬ 
tor  r"„  is  greater  than  one,  dissect  in  instead  of  n .  If  the  node  is  coimected  to  any  constant 
with  a  selection  edge,  do  the  following: 

11.1)  If  the  selection  is  common  to  the  set  of  conditions  which  share  the  common  join 
edge,  execute  ae  instantiation  before  the  node  is  dissected. 

1.1.2)  If  the  iMxle  is  connected  to  a  different  constant  for  each  of  the  conditions  that  share 
the  common  join  edge  and  these  selections  are  comparable,  execute  only  the  most 
general  selection  before  the  node  is  dissected.  The  other  selections  are  done 
immediately  following  the  dissection. 

1.2)  Select  a  node  which  is  incident  on  a  common  join  edge  between  a  relation  node  (i.e..  itself) 
and  a  method  for  dissection.  This  choice  can  be  superceded  by  the  same  considerations 
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listed  in  step  3  of  Algorithm  3.1.  If  the  node  is  connected  to  any  constant  with  a  selection 
edge,  do  the  following: 

1.2.1)  If  the  selection  is  common  to  the  set  of  conditions  which  share  the  common  edge, 
execute  the  instantiation  before  the  node  is  dissected. 

1.2.2)  If  the  node  is  connected  to  a  different  constant  for  each  of  the  conditions  that  share 
the  common  edge  and  these  selections  are  comparable,  execute  only  the  most  gen¬ 
eral  selection  before  the  node  is  dissected.  The  other  selections  are  done  immediately 
following  the  dissection. 

1.3)  If  no  node  can  be  selected  in  the  above,  select  a  node  for  dissection  according  steps  2  and  3 
of  Algorithm  3.1. 

2)  Do  step  1  until  no  edge  remains. 

3)  Return  the  Caitesian  product  of  the  relations  saved  so  far  for  each  condition. 

□ 

Example  3-2.  Consider  the  multiple  queries  shown  in  Figure  3.4.  Initially,  no  instantiation  is 
possible  since  none  of  them  is  shared.  According  to  step  1.1  of  Algorithm  3.2,  the  node  tri  is  be 
selected  for  dissection  in  Step  1  (This  is  aibitrary,  the  node  rec  can  be  selected  as  well.)  Assuming 
that  the  reduction  factor  between  tri  and  rec  is  5,  rec  is  dissected  instead  of  tri .  The  method  dis¬ 
tance  .  with  input  node  rec  can  be  evaluated  in  Step  2.  In  Step  3.  the  common  join  with  condition 
(seq  tri. COLOR  rec. COLOR)  is  performed.  Subsequently,  the  selection  (He  tri. SIZE  10)  is  be  exe¬ 
cuted  before  the  evaluation  of  the  method  distance. 

□ 

The  algorithm  for  muidple  queries  described  in  this  subsection  considers  procedural  methods  in 
sharing  of  common  subexpressions  among  multiple  conditions.  One  minor  modification  made  to  the 
decomposition  algorithm  presented  in  [ChMi86]  is  the  change  of  the  order  of  preference  in  selecting 
operations.  Qearly,  this  algorithm  does  not  generate  a  globally  optimal  access  plan  while  the  methods 
in  [SeU88]  and  [PaSe88]  do.  The  procedure  generated  by  this  algorithm  could  be  noticeably  expen¬ 
sive  compared  with  the  case  in  which  each  common  subexpression  is  extremely  high  while  it  can  be 
avoided  if  each  condition  is  processed  separately.  In  order  to  prevent  this  situation,  we  could  iiKlude 
some  checking  procedure  prior  to  the  evaluation  of  each  common  subexpression.  For  examine,  we 
can  estimate  the  cardinality  of  the  result  relation  for  a  common  subexpression  and  take  alternatives  if 
it  is  too  large. 

3,5 J  PROGRAM  TRANSFORMATION 

Once  each  Rgand, .  1  ^  *  S  n ,  is  obtained,  a  query  of  type-GENERAL_COND  given  earlier  in  this 
section  can  be  transformed  into  the  foUowing; 

(forall  V,  in  Rca,^^ 
action  \) 

(forall  V2  in  (R„„d^  - 
action  2) 


(forall  V,  in  (R„^^  -  R^^^^  -  ...  -  Rcand.J 
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action„ ) 

Using  a  temporary  relation,  a  number  of  subtraction  operations  can  be  saved  in  the  above  query: 

{forall  Vj  in  Rcand^ 
action 

(forall  V2  in  -  Rcandt) 

action  2) 

(set  temp  (Rca„d^  ^  Rcand^ii 

(forall  V3  in  (Rcand^  -  temp) 
action  2) 

(set  temp  (temp  U  Rcand^D 

(set  temp  (temp  U  Rcand,J) 

(forall  v„  in  (Rcand,  *  temp) 
action„ ) 

Similarly,  a  query  of  type-MULTIPLE_COND  can  be  transformed  into  the  following: 

(forall  V,  in  Rca„d^ 
action  x) 

(forall  vj  in  R^andj 
action  2) 

(forall  v„  in  Rca„d^ 
action„ ) 

The  evaluation  cost  of  the  query  can  thus  be  reduced  by  considering  ctHnmon-subexpressions  in  the 

transformed  query. 


3.6  PROCESSING  A  TYPE-NESTED_FORALL  QUERY 

When  a  forall  statement  or  a  set  of  successively  nested  forall  statements  is  present  with  other 
statements  in  the  body  of  another  set  of  successively  nested  forall  statements,  a  query  is  not  canoni¬ 
cal  and  a  different  optimization  technique  is  needed.  In  OASIS,  a  type-NESTED_FORALL  query  is 
transformed  to  a  canonical  query  if  it  is  equivalent  to  a  traditional  nested  query  [Kim82].  If  a  type- 
NESTED_FORALL  query  cannot  be  transformed  into  a  canonical  one,  the  nested  forall  statement 
can  be  optimized  by  avoiding  repeated  processing  of  invariant  computations  inside  of  the  statement 
In  this  subsection,  we  describe  an  afqxoach  to  detecting  loop  invariants  inside  of  a  nested  forall 
statement  To  start,  given  a  nested  forall  statement  within  arx)ther  nested  forall  statement,  we  define 
its  associated  inside  loops  to  be  all  the  forall  statements  included  in  the  statement  and  its  associated 
outside  loops  to  be  all  the  forall  statements  which  iterate  on  the  nested  forall  statement.  As  an 
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example,  consider  the  following  query: 

(forall  V 1  in  triangle 
(forall  V2  in  rectangle 
(forall  V3  in  circle 

(cond  ((and  (intersect  Vj  v-^) 

(seq  V2.COLOR  V3.COLOR)) 

(retrieve  v  3. AREA  into  Temp))} 

(cond  (ieq  v^AREA  imax(Temp)) 

(retrieve  V  x.TlD)))))) 

We  can  derive  the  followings: 

nested  forall  statement  =  (forall  V2  in  triangle  •  •  •  ) 
inside  loops  =  {  (forall  V2  in  rectangle  •  •  ). 

(forall  V3  in  circle  •  •  •  )  } 
outside  loops  =  {  (forall  vj  in  triangle  •  •  •  )  } 

Given  a  nested  forall  statement,  a  connection  graph  can  be  constnicted  for  its  associated  inside  loops. 
The  connection  graph  for  the  nested  forall  statement  (forall  vi  in  rectangle  is  shown  in  Rgure 
3.6.  Note  that  as  before,  we  use  arrows  to  represent  input  arguments  for  procedural  methods. 


Figure  3.6.  A  connection  graph. 


Given  a  connection  graph,  we  traverse  it  from  all  the  node  corresponding  to  the  relations 
included  in  its  outside  loops  (e.g.,  triangle  in  this  example).  In  the  traversal,  we  follow  all  incident 
arrows  and  edges  on  a  current  node,  where  arrows  are  traversed  only  in  die  forward  direction.  After 
the  traversal,  we  can  collect  all  the  edges  which  have  not  been  passed  as  loop  invariants,  and  these 
edges  can  be  precomputed  outside  of  the  nested  forall  statement  only  once.  In  this  example,  we  start 
the  traversal  with  the  node  tri.  The  traversal  soon  terminates  at  the  node  intersect  -,  thus  leave  the  edge 
between  rec  and  cir  not  traversed.  Consequently,  the  join  between  tri  and  rec  is  considered  loop 
invariant  and  can  be  precomputed  and  saved.  As  a  result,  tlK  above  query  can  be  transformed  into 
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the  following: 

(forall  V 1  in  triangle 
(forall  V2  in  new 

(cond  (intersect  V I  v^RECT ANGLE) 

(retrieve  V2.AREA  into  Temp))) 

(cond  (ieq  v^AREA  (imax  Temp  ""  AREA)) 
(retrieve  v  i.TlD))) 

where  the  temporary  relation  new  can  be  computed  as 

(forall  V3  in  rectangle 
(forall  V4  in  circle 

(cond  (seq  VyCOLOR,  v^.COLOR) 

(retrieve  V3  and  v  4. AREA  into  new)))) 


3.7  PROCESSING  A  TYPE-ASSORTED  QUERY 

When  a  number  of  statements  are  arbitrarily  combined  and  nested  in  a  set  of  successive  forall 
statements,  evaluation  or  optimization  of  a  nested  ^tement  may  be  affected  by  other  nested  state¬ 
ments.  In  this  section,  update  statements  such  as  append,  delete  and  replace  are  considered. 

3.7.1  EVALUATION  OF  ASSORTED  STATEMENTS 

A  statement  in  OASIS  could  be  a  forall  statement,  a  cond  statement,  a  method,  or  an  update 
statement.  Even  though  a  method  can  stand  alone  as  a  statement,  in  most  cases  methods  are  included 
in  a  forall  or  a  cond  statement  as  a  part  of  its  condition  or  its  action. 

Adjacent  forall  or  cond  statements  can  usually  be  evaluated  with  the  consideration  of  common 
subexpression  sharing  unless  update  operations  are  included  in  one  of  these  statements.  If  no  update 
operation  is  included,  connection  graphs  for  these  statements  can  be  merged  into  one  and  Algorithm 
3.2  can  be  applied  to  evaluate  this  gnq)h.  For  example,  the  following  type-ASSORTED  query  can  be 
considered  as  two  sub-queries,  and  the  merged  coimecdon  graph  for  this  query  is  shown  in  Figure 
3.7(a). 

(forall  V 1  in  triangle 
(forall  V2  in  rectangle 
(forall  V3  in  circle 

(cond  ((and  (seq  vyCOLOR  V2.COLOR) 

(ieq  v^AREA  v^AREA))) 

(retrieve  v  3. AREA  ))) 

(cond  (seq  Vi.COLOR  V2.COLOR) 

(re/rieve  VpTID)))) 


can  be  decomposed  into  two  sub-qureies: 


statement  1: 
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(forall  V I  in  triangle 
(forall  V2  in  rectangle 
(forall  V3  in  circle 

(cond  ((and  (seq  Vx-COLOR  V2-C0L0R) 
(ieq  V2AREA  v^AREA))) 
(retrieve  v 3. AREA  ))))) 


statement  2; 

(forall  V ,  in  triangle 
(forall  V2  in  rectangle 

(cond  (seq  Vx.COLOR  V2.COLOR) 
(retrieve  Vj.TID)))) 


The  merged  connection  graph  for  this  queiy  is  shown  in  Figure  3.7(a). 


(•) 


r(I.J)  •(«) 

o  o 


w 


Figure  3.7.  A  merged  connection  gr^  for  a  type-ASSORTED  query. 


Note  that  in  the  connection  gnqrh  shown  in  Rguie  3.7(a).  each  relation  node  is  marked  with  the 
statements  in  which  it  is  iterated.  In  this  example,  relations  triangle  and  rectangle  are  iterated  in 
both  statements,  but  relation  circle  is  iterated  only  in  statement  1.  Given  a  merged  connection  graph. 
Algoritlun  3.2  can  be  applied  to  process  it  until  no  edge  remains.  A  candidate  relation  for  each  state¬ 
ment  i  can  be  generated  by  a  Cartesian  product  of  all  the  disjointed  nodes  marked  with  /. 

The  connection  graph  shown  in  Figure  3.7(a)  can  be  processed  by  dissecting  the  node  rec  first 
since  it  disconnects  the  graph  and  it  is  iiKident  on  a  common  join  edge.  After  this  operation,  two 


-44  - 


nodes  r(l,2)  and  s(l)  remain  as  shown  in  Figure  3.7(b).  where  the  node  r(1.2)  represents  the  result 
of  the  join  between  the  relation  triangle  and  a  tuple  r  in  the  relation  rectangle,  and  (1.2)  denotes  that 
tills  node  belongs  to  statements  1  and  2.  Similarly.  s(l)  stands  for  the  results  of  the  join  between 
relation  circle  and  a  tuple  t  in  rectangle,  and  it  belongs  to  statement  1.  As  a  result,  the  candidate 
relations  for  statements  1  and  2  can  be  generated  as  follows: 

Rcand,=  U  (r(1.2)x^(l)) 

‘  t  in  r(1.2) 

Rcana,=  U  (r(l.2)) 

^  t  inr(1.2) 

Evaluation  of  n  consecutive  statements  by  Algorithm  3.2  is  viable  when  the  summation  of  the 

n 

cardinalities  of  all  the  candidate  relations  (i.e.,  2^  I  Rcand,  I)  less  than  the  cardinality  of  the  Cartesian 

i-l 

product  of  all  the  involved  relations  and  there  is  a  reasonable  amount  of  sharing.  If  at  least  one  state¬ 
ment  runs  on  all  the  combinations  of  tuples  (i.e.,  the  Cartesian  product)  of  the  involved  relations, 
applying  Algorithm  3.2  cannot  reduce  the  evaluation  cost  comparing  to  the  simple  nested  iterative 
approach.  A  cond  statement  with  a  condition  which  is  always  true  or  a  method  which  stands  alone  as 
a  statement  are  examples.  Given  a  sequence  of  statements  nested  in  a  set  of  successively  nested 

n 

forall  statements,  estimating  J)  I  Rcand,  I  and  the  amount  of  sharing  prior  to  applying  Algorithin  3.2 

1-1 

is  necessary. 

3.7 J  UPDATE  OPERATIONS  IN  QUERY  PROGRAMS 

Evaluation  of  multiple  statements  is  more  restricted  if  update  operations  (i.e.,  append,  delete, 
and  replace)  are  involved.  When  update  operations  are  included  in  a  series  of  statements,  some  rela¬ 
tions  could  be  modified  by  a  statement.  Consequently,  any  statement  which  follows  it  and  uses  the 
modified  relations  should  wait  until  the  modification  is  done  before  it  is  evaluated.  This  is  an  exam¬ 
ple  of  data  dependence  [KuckSl],  and  we  say  that  the  latter  statement  depends  on  the  fonner.  Gen¬ 
erally,  when  there  are  data  dependences  among  nested  statements,  a  statement  cannot  be  evaluated 
separately  using  the  candidate  relations  described  in  the  previous  subsection.  For  example,  consider 
the  following  query: 

(forall  V 1  in  triangle 
(forall  V2  in  rectangle 
(forall  V3  in  circle 

(cond  ((and  (seq  v^.COLOR  v^COLOR) 

(ieq  V  2. AREA  v  3. AREA))) 

(replace  V3.AREA  (iadd  V3.AREA  10)))) 

(cond  ((and  (intersect  v,  V3) 

(ieq  V  2- AREA  v  3. AREA))) 

(retrieve  (list  Vi.TID  V2.RID  V3.QD))))) 

Wc  have  two  statements  nested  (in  parallel)  in  a  set  of  nested  forall  statements: 

5i  :  (cond  ((and  (eq  v,. COLOR  V3.COLOR) 

(ieq  V2.AREA  V3.AREA))) 
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{replace  v  3. AREA  (iadd  v  3. AREA  10)))) 

$2  (cond  ((and  (intersect  v,.  V3) 

(ieq  V  2- AREA  v  3. AREA))) 

(retrieve  (list  v,.TID  V2.RID  V3.aD))) 

In  the  above  example,  S  ]  updates  the  relation  circle,  and  52  reads  the  value  of  the  relation.  The 
execution  of  52  should  wait  until  5i  is  executed  in  each  iteration.  Because  the  relation  circle  might 
be  updated  again  in  the  next  iteration,  the  execution  of  52  cannot  be  postponed  to  the  next  iteration 
either.  As  a  result,  the  nested  statements  5]  and  S2  should  be  evaluated  sequentially  in  each  iteration 
of  tire  forall  loops.  In  the  remaining  of  this  subsection,  we  describe  the  data  dependence  among 
nested  statements. 

We  defiire  the  readjset  and  the  write _set  for  each  statement  as  follows.  The  read^set  of  a  state¬ 
ment  is  a  set  of  relations  whose  contents  are  read  by  the  statement  Similarly,  the  write_set  of  a  state¬ 
ment  is  a  set  of  relations  whose  contents  are  modified  by  the  statement.  This  can  be  rewritten  as 

read_set(Si)^{  R  1  the  contents  of  R  are  read  by  5,  } 
write_set(Si)={  R  I  the  contents  of  R  are  modified  by  5,  } 

Given  a  sequence  of  nested  statements  5 1  ....  ,  5,  in  a  set  of  nested  forall  loops,  they  can  be  num¬ 
bered  so  that  i  <  j  if  5,  comes  before  Sj  in  the  sequence.  The  procedure  of  finding  the  data  depen¬ 
dence  among  a  set  of  nested  statements  can  be  summarized  as  follows: 

(1)  Derive  readjet  and  write  jet  for  each  statement  5,,  1  S  r^. 

(2)  We  have  a  dependence  pair  (5, ,  Sj)  if 

i)  i  <  j 

ii)  write  jet  (Si )  O  readjet  (Sj)  *  0 

iii)  write jet(Si)  n  readjet(Sj)  O  wrire_sef(5*)  =  0,  for  all  r  <  /k  <  y. 

In  each  dependence  pair  (Si,  Sj),  Sj  depends  on  5,. 

Example  3-3.  For  the  example  query  in  this  subsection,  we  can  derive  the  following: 
readjet  (Si)  =  {  triangle,  rectangle,  circle  } 
write jet(S  i)  =  {  circle  } 
readjet (S 2)  =  {  triangle,  rectangle,  circle  ) 
write  jet  (S  2)  =  {  ) 

Through  a  dependence  analysis.  (5 1,  52)  is  found  to  be  a  dependence  pair  because  the  set  {  R  I R  e 
write  set(S  1)  H  read  set(S;)  }  (which  is  {  circle  })  is  not  empty. 

□ 

Given  a  type-ASSORTED  query  which  includes  update  operations,  we  search  for  any  data 
dependetree  among  nested  statements  as  described  above.  If  no  data  dependence  is  included  in  the 
query,  it  can  be  evaluated  as  described  in  the  previous  subsection.  Otherwise,  the  query  should  be 
evaluated  by  tKsted  iterations. 
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3^  PROCESSING  A  TYPE-GENERAL  QUERY 

A  type-GENERAL  query  may  contain  multiple  levels  of  nesting,  where  each  level  of  nesting 
would  be  jne  of  the  basic  patterns,  i.e.,  canonical.  GENERAL_COND,  MULTIPLE_COND. 
NESTED_FORALL,  or  ASSORTED.  Given  a  type-GENERAL  query,  first  it  can  be  optimized  glo¬ 
bally  by  moving  some  computations  to  outer  levels  of  nesting  from  inside.  Then,  it  can  be  processed 
by  applying  the  optimization  techniques  discussed  earlier. 

3A2  GLOBAL  OPTIMIZATION 

In  a  query  with  multi-levels  of  nesting,  computation  in  a  nested  loop  is  supposed  to  be  evaluated 
for  each  iteration  of  an  outer  loop.  In  this  situation,  if  some  computations  in  a  nested  loop  could  be 
executed  in  an  outer  loop  without  changing  the  results,  the  overall  evaluation  cost  can  be  reduced. 

Detection  of  loop-invariants  has  been  discussed  earlier  in  this  section.  Loop-invariants  of  a  loop 
denote  those  computations  whose  results  do  not  depend  on  the  iteration  variables  of  the  loop.  In 
some  cases,  even  those  computations  which  are  not  loop-invariants  can  be  removed  from  a  loop  if 
necessary  variables  are  saved  properly.  Generally,  we  can  postpone  any  computation  until  its  results 
are  used  by  some  others.  The  reason  for  postponing  computations  is  that  the  amount  of  postponed 
computations  may  be  reduced  after  they  have  gone  through  database  operations  such  as  joins  and 
selections. 

To  optimize  a  query  globally,  levels  are  introduced  in  a  connection  graj^  for  queries  with  rnulti- 
ple  levels  of  nesting.  Levels  of  nesting  can  be  numbered  by  setting  the  outermost  level  to  level  1  and 
increasing  the  number  by  one  as  it  goes  down  to  irmer  loops.  Two  adjacent  levels  are  divided  by  a 
dotted  line  in  a  conneaion  gra(^.  A  connection  graph  with  the  notation  of  levels  for  the  following 
nested  query  is  shown  in  Figure  3.8. 

(forall  V 1  in  obstacle 

(forall  V2  in  triangle 

(forall  V3  in  rectangle 

(cond  ((and  (setq  INTSEC  intersect(v2.  v^)  (setq  S  (get_area  INTSEC))) 

(retrieve  V3.AREA  S  into  Tempi}) 

(cond  (ieq  v  2. AREA  imax(Templ  ""  AREA})) 

(retrieve  V2.TID  Templ.S  into  Tempi)) 

(cond  (and  (include  Templ.TlD  Vij  ((igt  Temp2.S  100))) 

(retrieve  Tempi. TID)))))) 

In  Figure  3.8,  the  dotted  arrows  designate  direct  copies  between  levels  and  they  do  not  represent 
any  computation.  In  this  example,  the  derived  attribute  S  in  level  3  is  not  used  until  the  outermost 
query  (i.e.,  level  1)  is  evaluated.  We  can  notice  that  the  number  of  calls  for  the  method  getjirea  can 
be  reduced  sigruficantly  if  we  defer  the  evaluation  until  level  1.  To  avoid  this,  the  input  argument 
INSEC  should  be  saved  in  temporary  relations. 

In  order  to  generalize  this  observation,  we  need  a  simple  data-flow  analysis.  Computations  in  a 
query  program  can  be  divided  into  two  types;  relational  operations  (e.g.,  join  and  selection)  and  pro¬ 
cedural  methods.  For  each  operation  C,  we  define  the  following; 

USEICJ  =  the  set  of  variables  whose  values  are  used  by  C. 

GENIC]  =  the  set  of  variables  whose  values  are  generated  by  C. 
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Figure  3.8.  A  connection  graph  for  a  nested  query. 


For  a  relational  operation,  the  set  USE  includes  all  the  relations  and  constants  which  are  involved  in 
the  operation  and  the  set  GEN  includes  the  result  relation.  For  a  pnxredural  method,  the  set  USE 
includes  all  the  input  arguments  and  the  set  GEN  includes  all  the  ouqwt  arguments.  For  each  level 
Lj,  GENlLj]  includes  the  iteration  variables  which  can  be  "used”  in  Lj  and  all  the  variables  available 
from  all  the  temporary  relations  in  L,.  According  to  the  above,  GEN[Li]  includes  all  the  iteration  vari¬ 
ables  for  level  Lj.  where  1  5  ySr. 

To  analyze  the  data  flows  in  each  level  Lf,  we  first  derive  the  USE  and  GEN  sets  for  all  the 
operations  included.  For  each  GEN  in  Lj ,  search  all  USE  sets  in  L,  to  check  if  any  variable  in  it  is 
used  in  that  level.  If  all  the  variables  in  G£iV(C]  are  not  used,  the  evaluation  of  C  can  be  moved  to 
the  next  higher  level,  i.e.,  Z,,_i.  TTie  details  of  tftis  procedure  are  described  in  the  next  algorithm. 
Because  it  is  very  expensive  to  save  all  the  input  relations  for  relational  operations,  only  procedural 
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methods  are  considered  in  the  next  algorithm. 

ALGORITHM  3J.  Global  Optimization 

INPUT;  A  connection  graph  for  a  type-GENERAL  query. 

OUTPUT;  An  optimized  type-GENERAL  query. 

Suppose  the  given  query  has  n  levels  of  nesting.  For  each  level  L,  .  where  i  -  from  n  to  1.  do  the 
following; 

1)  Derive  GEN  [L,  ]  and  GEN  and  USE  for  each  operation  in  L, . 

2)  For  each  procedural  method  A/y.  search  through  all  USEs  to  check  any  variable  in  it  is  used  or 
not  If  all  the  variables  in  GEN[Mj]  are  not  used  at  the  level,  do  the  following; 

2.1)  If  any  variable  in  USE[Mj]  is  an  iteration  variable,  go  to  the  beginning  of  Step  2  and  con¬ 
sider 

2.2)  Save  all  the  variables  in  USE[Mj]  into  a  temporary  relation  and  move  A/y  into  the  next 
level  L,_i  with  proper  connections  between  members  of  USE[Mj]  and  GEN[Cj]. 

3)  Repeat  Step  2  until  no  method  can  be  moved. 

□ 

Example  3-4,  Consider  the  connection  graph  ^wn  in  Figure  3.8.  At  level  3,  we  can  derive  the 
following; 

GEN[L{[  »  {  V3,  V2  } 

USE  [intersect]  -  {  V3,  V2  ) 

GEN  [intersect]  =  {  tNTSEC  ] 

USE[get_area]  =  {  INTSEC  ) 

GEN[getjirea]  =  {  S  } 

With  a  simple  search,  we  can  find  that  the  variiU}le  in  GEN[getjirea]  (i.e.,  5)  is  not  used  by  any 
other  computations.  Because  USE[getjirea]  (i.e.,  INTSEC)  is  not  an  iteration  variable  (i.e.,  V3  and 
V2),  the  method  get_area  can  be  moved  into  level  2  by  saving  INTSEC  into  a  temporary  relation. 
Without  getjtrea,  the  next  iteration  of  Step  2  in  Algorithm  3.3  will  find  that  GEN  [intersect]  (i.e., 
INTSEC )  is  not  used,  and  therefore  it  can  be  moved  up  again.  In  this  example,  the  method  intersect 
cannot  be  moved  because  its  USE  iiKludes  some  iteration  variables. 

After  inlying  Algorithm  3.3  to  levels  2  and  1,  the  connection  gr^h  is  changed  as  shown  in 
Figure  3.9  and  the  corresponding  query  program  is  as  follows; 

(forall  V I  in  obstacle 

(forall  V2  in  triangle 

(forall  V3  in  rectangle 

(cond  ((setq  INTSEC  intersect  vj  vj)) 

(retrieve  v 3. AREA  INTSEC  into  Tempi))) 

(cond  (seq  Vi-AREA  (imax  Tempi  ""  AREA))) 

(retrieve  V2.TID  TempIJNTSEC  into  Temp2)) 

(cond  ((and  (include  Temp2.TID  vj) 

(and  ((setq  S  (getjtrea  Temp2.INTSEC ) 

(igtS  100))))" 
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(retrieve  Temp2.TlD))) 


□ 


Figure  3,9.  An  optimized  connection  graph. 


In  the  above  example,  the  number  of  calls  to  the  method  getjtrea  is  reduced  because  the  tem¬ 
porary  relations  have  been  operated  by  a  set  of  selections  before  the  method  getjtrea  is  evaluated  in 
the  modified  program.  Once  a  type-GENERAL  qireiy  is  optimized  by  die  above  algorithm,  each  level 
of  nesting  can  be  optimized  fuither  by  applying  an  appropriate  technique  discussed  earlier.  In  the 
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next  subsection,  we  describe  the  general  procedure  to  evaluate  a  type-GENERAL  query. 

3.8J  PROCEDURE  OF  EVALUATING  A  TYPE-GENERAL  QUERY 

Given  a  type-GENERAL  query,  it  can  be  globally  optimized  first  as  discussed  in  the  previous 
subsection.  Further  optimization  and  evaluation  is  based  on  the  optimization  techniques  for  basic  pat¬ 
terns.  One  way  to  represent  the  structure  of  a  given  query  is  to  use  a  query  graph. 

A  query  graph  is  a  binary  tree,  where  each  node  denotes  a  statement  Each  node  can  have  up  to 
two  children:  a  left  child  and  a  right  child,  where  the  left  child  represents  the  first  statement  among  its 
nested  statements  and  the  right  child  represents  the  next  statement  in  the  same  level  of  nesting.  For 
example,  the  query  graph  for  the  transformed  query  in  the  previous  subsection  is  shown  in  Figure 
3.10. 


Figure  3.10.  A  query  graph. 


In  this  query  graph,  nodes  are  labelled  by  F.  I.  R  and  RI  to  represent  statements  of  types  forall, 
cond,  retrieve  and  retrieve Jnto ,  respectively. 

The  optimization  techniques  discussed  in  this  section  can  be  divided  into  two  types: 

a)  Type  A:Techniques  wf  ..  h  transfonn  a  given  query  syntactically. 

Techniques  of  this  type  are  those  developed  for  type-NESTED_FORALL  queries  (i.e., 
detecting  loop-invariants  and  transfonning  them  into  canonical  queries). 

b)  Type  B:Techniques  which  speed  up  the  evaluation  of  a  given  query. 

Techniques  of  this  type  are  those  developed  for  canonical  queries  and  for  types 
GENERAL  COND.  MULTIPLE  COND  and  ASSORTED. 
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Techniques  of  type  A  may  change  the  structure  of  a  given  query  or  move  some  computations 
into  upper  levels.  For  this  reason,  techniques  of  type  A  should  be  applied  prior  to  those  of  type  B. 
Also,  they  should  be  applied  in  a  bottom-up  fashion  (i.e.,  from  the  lower  levels  to  the  upper  levels)  so 
that  any  change  in  the  stnicture  of  the  query  can  be  propagated  to  the  upper  levels. 

Techniques  of  type  B  can  include  some  modifications  to  the  statements,  but  those  modifications 
are  internal  and  do  not  affect  the  overall  structure  of  a  given  query.  Techniques  of  type  B  are  applied 
in  a  top-down  fashion. 

In  summary,  a  general  procedure  for  processing  a  type-GENERAL  query  is  described  in  the  fol¬ 
lowing  algorithm. 

ALGORITHM  3.4.  OPTIMIZATION  OF  A  TYPE-GENERAL  QUERY 
INPUT;  A  query  graph  for  A  type-GENERAL  query. 

OUTPUT:  An  optimized  evaluation  procedure. 

1.  Apply  Algorithm  3.3  to  the  query  graph  for  global  optimizatioa 

2.  Search  the  query  graph  in  the  post-order  (i.e.,  search  the  graph  recursively  in  the  order  of  left 
child,  right  child,  and  parent)  and  identify  basic  patterns  in  it. 

3.  Optimize  all  type-NESTED_FORALL  in  a  bottom-up  fashion. 

4.  Evaluate  the  optimized  query  in  a  top-down  fashion.  For  each  level,  apply  a{^ropriate  optimiza¬ 
tion  techniques. 

□ 
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4.  CONSTRUCTIVE  PLANNING 

Parallel  to  queries,  in  an  object  base  we  define  an  update  query  to  be  a  fonnula  of  the  fonn: 

declarations 

action 

such  that  conditions 

Unlike  queries,  the  puipose  of  update  queries  is  to  move  an  object  base  from  one  state  to  another 
state,  subject  to  the  constraints  imposed  by  the  system.  The  major  problem  with  conjunctive  update 
queries  is  that  the  operations  in  a  query  may  be  specified  in  an  order  such  that  the  standard  PROLOG 
evaluation  process  may  fail  to  achieve  the  desirable  purpose  due  to  the  so-called  “negative  goal 
interactions’’  [Wile83].  For  instances,  performing  one  operation  may  accidentally  undo  some  previ¬ 
ously  accomplished  operations,  or  performing  one  operation  may  prevent  some  other  operations  from 
being  performed.  The  properly  execute  the  operations,  planning  is  required. 

In  the  past,  we  have  seen  the  major  problem  wifii  linear  planners;  goal  interactions.  In  many 
situations,  goal  interactions  cannot  be  removed  by  sim{dy  reordering  the  operator  sequence  by  which 
these  goals  are  achieved.  Rather,  it  requires  that  the  operations  be  intermixed.  Some  mmlinear 
planners  have  been  proposed  based  on  this  observation.  Although  nonlinear  planners  have  been  proven 
to  be  more  efficient  than  linear  ones,  however,  it  can  be  proved  that  they  have  by  no  means  solved 
general  planning  problems.  Since  robot  task  planning  is  a  special  class  of  planning  problems,  in  this 
section  we  will  show  that  some  subclasses  of  robot  task  planning  problems  can  be  solved  effectively. 


4.1  NONLINEAR  PLANNING  AS  CONSTRAINT  SATISFACTION 

In  the  simplest  case,  a  nonlinear  planner  can  first  develop  a  plan  for  each  conjunct  goal,  assum¬ 
ing  that  there  is  no  interactions  among  these  goals.  Once  the  plans  ate  developed,  they  are  merged 
together  based  on  the  interactions  among  them.  Initially,  each  plan  provides  a  partial  ordering  among 
its  own  operations.  As  more  interactions  are  uncovered,  it  may  be  necessary  to  move  an  operator  from 
one  plan  to  another  place  (so  that  certain  interaction  can  be  avoided)  and  consequently  another  partial 
ordering  constraint  is  developed.  For  instance,  if  operator  p  in  plan  x  negates  the  precondition  of 
operator  q  of  plan  y ,  and  x ,  y  are  executed  in  sequetKe  initially.  In  this  case,  this  negation  may  be 
avoided  by  moving  q  so  that  q  is  executed  before  p.  In  other  words,  a  partial  ordering  constraint  p  > 
q  is  developed. 

We  shall  regard  this  as  an  operator  ordering  problem,  and  an  ordering  criterion  has  been  pro¬ 
posed  by  Chapman  in  his  Modal  truth  Criterion  [Chap87]  as  described  in  the  following. 

Modal  Truth  Criterion 


A  proposition  p  is  necessarily  true  in  a  situation  s  iff  two  conditions  hold:  there  is 
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a  situation  t  equal  or  necessarily  previous  to  s  in  which  p  is  necessarily  asserted;  and  for  every  step 
C  possibly  before  s  and  every  proposition  q  possibly  codesignating  with  p  which  C  denies^,  there  is 
a  step  W  necessarily  between  C  and  s  which  asserts  a  proposition  such  that  r  and  p  codesignate 
whenever  p  and  q  codesignate.  The  criterion  for  possibly  truth  is  exactly  analogous,  with  all  the 
modalities  switched  (read  "necessary”  for  "possible”  and  vice  versa)." 

For  the  operator  ordering  problem,  consequently,  letp  be  an  ordering  of  operators  (/i,.. 

p  =  (f\a\e^) 

where  e'  stands  for  the  effects  created  by  operator  /'  and  a'  stands  for  the  preconditions  that  have  to 
be  held  before  /'  can  be  applied.  For  simplicity,  we  assume  that  each  e'  is  a  preposition  in  the  opera¬ 
tor  ordering  problem  and  a'  is  true.  Assume  G  =  {«, . u„ }  is  a  set  of  conjunctive  goals  that  have  to 

be  satisfied,  and  for  each  u,,  1  <  /  <  n,  there  is  one  and  only  one  fj  such  that  e-'  =>  u,.  An  operator 
ordering  problem  with  the  above  assumptions  will  be  called  a  simple  operator  ordering  problem  later. 
For  each  Ui .  let 

V'i  =  {/;le-'  =>«,}  =  {v,} 

Uj  —  [f  j\  e'^  — >  Uj]  —  {U| 

Let  be  the  position  of  /,  inp,  we  can  obtain  that  p  is  a  plan  if  and  only  if  for  each  i,  the  fol¬ 
lowing  is  true; 

[<7(v/)  >  ^ ...  ^ 

Let  y,  denote  the  above  formula,  let  let  djj  be  [q(fi)>  qifj)].  and  let  D  be  the  set  {dij\  there  exists  a 
y*  such  that  e  y* }.  The  problem  of  searching  for  an  ordering  of  (f  *s  equivalent  to  the 

problem  of  instantiating  iq(f  .  qifn))  such  that  yi  ^  a  y„  is  true.  To  determine  if  a  simple 

operator  ordering  problem  is  solvable,  we  can  first  enlarge  D  to  D  with  the  transitive  rule; 

dij  f^djt  —*  dik,  for  all  possible  i,j,  and  k. 

Now,  the  problem  is  solvable  if  there  exist  no  i  and  j  such  that  both  dij  and  dy,  belong  to  D  . 

If  an  operator  ordering  problem  is  determined  to  be  solvable,  assuming  the  cardinality  of  D  is 
r ,  a  solution  can  be  obtained  according  to  Algorithm  4. 1  as  follows; 

Algorithm  4.1 

step  1 

sequence  =  null; 


1 

2 


C  is  called  a  C.  ">ber  by  Chapman. 

W  is  called  a  white  knight ;  the  process  which  re-asserts  r  is  called  declobbering  . 
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step  2 

1.  For  /  =  1  to  /I  do 

If  there  exists  no  j  such  that  d^j  belongs  to  D  ,  append  /,  to  the  end  of  sequence ; 

2.  For  all  i  identified  above,  remove  each  for  any  k  from  D  .  If  no  such  i  could  be  identified 

in  (1)  terminate;  otherwise  go  to  step  2. 

□ 

According  to  step  2,  the  complexity  of  this  algorithm  is  0(r).  The  idea  of  assuming  that  the  effects 
and  preconditions  of  operators  as  static,  state  independent  propositions  and  treating  a  conjunctive  plan¬ 
ning  problem  as  a  constraint  satisfaction  problem  is  not  new.  Such  an  idea  has  been  explored  in 
DCOMP  [NilsSO].  In  DCOMP,  a  goal  reduction  process  first  develops  an  AND  tree  by  expanding 
top-level  goals  into  more  detailed  ones,  assuming  that  all  the  goals  are  independent.  However,  the 
goal  reduction  process  provides  a  partial  ordering  among  higher-level  goals  and  lower-level  goals.  An 
analysis  of  possible  goal  interactions  is  then  carried  out.  As  a  consequence,  for  each  goal,  two  lists  are 
constructed.  The  first  list,  called  the  add  list,  ctxitains  all  the  subgoals  whose  effect  can  produce  a 
precondition  of  the  associated  goal.  The  second  list,  called  the  delete  list,  contains  all  the  subgoals 
whose  effect  can  negate  a  precondition  of  the  associated  goal.  Based  on  the  add  and  delete  lists, 
further  ordering  ccmstraints  can  be  developed,  and  sometimes  new  steps  are  added  to  satisfy  the  order¬ 
ing  constraints. 

Like  STRIPS,  a  hierarchical  version  of  DCOMP,  called  NOAH,  was  proposed  [Sace751.  Several 
researchers  [AlKo83]  [McDe83]  [MFDD85]  [Vere83]  extended  NOAH  by  improving  the  representa¬ 
tion  of  time  in  different  ways.  David  Wilkin’s  SIPE  (Wilk84]  further  extended  it  with  the  concept  of 
resource ,  by  which  operations  utilizing  shared  resources  can  be  sequenced  to  avoid  conflicts,  and  with 
the  mechanism  that  can  perfonn  simple  deductions  based  on  the  effects  [mxluced  by  operators.  More 
recently,  based  on  the  Modal  Troth  Criterion,  David  Chapman’s  TWEAK  [Chap87]  proposed  a  non¬ 
linear  approach  that  can  be  proved  to  be  complete,  i.e.,  the  [banner  can  develop  a  plan  if  it  indeed 
exists.  However  it  may  never  stop  if  such  a  plan  does  not  exist,  and  Chapman  showed  that  this  is  the 
best  we  can  hope  for. 

TWEAK  can  be  regarded  as  the  first  planning  mechanism  which  is  developed  based  on  a  sound 
theory.  Like  other  nonlinear  planners,  it  is  constraint-based,  so  that  a  plan  is  a  partial  order  of  opera¬ 
tors.  Basically,  given  a  set  of  goals,  TWEAK  tries  to  accomplish  the  goals  by  constructing  and 
refining  partial  plans,  where  a  partial  plan  is  a  set  of  steps  for  which  some  information  (e.g.,  variables, 
orders)  is  not  specified.  Constraints  are  developed  as  the  planning  process  proceeds.  A  typical  con¬ 
straint  could  be  "The  variables  x  and  y  cannot  codesignate  (i.e.,  be  unified).":  or  it  could  be  a  partial 
ordering  constraint  like  "The  action  x  has  to  be  executed  before  action  y,  although  they  do  not  have 
to  be  performed  back  to  back."  At  any  time,  the  plaiuier  has  to  make  sure  that  the  Modal  Troth  Cri¬ 
terion  is  satisfied,  i.e.,  the  preconditions  of  an  action  are  always  true,  regardless  how  the  current  par¬ 
tial  plan  is  completed.  Assuming  p  is  a  precondition  for  an  action  that  has  been  included  in  a  partial 
plan,  and  s  is  the  current  situation,  TWEAK  employs  one  of  the  following  technitpies  for  this  pur¬ 
pose: 

1.  Simple  Establishment.  If  p  has  been  estaUished  in  s,  the  planner  constrains  a  step  that  estab¬ 
lishes  p  to  occur  before  s . 
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2.  Step  Addition:  A  step  is  added  right  before  s  in  order  to  assert  p . 

3.  Promotion:  A  clobber  is  moved  backward  so  that  it  will  happen  after  s . 

4.  Separation:  If  q  is  a  clobbering  proposition  which  possibly  codesignates  with  p ,  the  planner 
constrains  the  partial  plan  by  not  allowing  q  to  codesignate  with  p . 

5.  Declobbering  by  White  Knight.  Insert  a  white  knight  to  avoid  clobbering 

We  can  discover  that  many  of  the  above  techniques  have  been  employed  in  some  previous  planners, 
for  which  such  techniques  were  proposed  as  heuristics.  However,  Chapman  can  prove  that  the  above 
techniques  are  necessary  and  sufficient  for  constructing  a  complete  and  correct  planner,  and  this  is  the 
major  contribution  of  TWEAK. 

Based  on  the  above,  a  TWEAK  [banner  can  work  in  a  straightforward  way.  Initially,  given  a 
problem,  the  plan  is  empty.  The  planner  always  looks  for  an  unaccomplished  goal  to  satisfy,  and  par¬ 
tial  ordering  constraints  are  incrementally  introduced  based  on  the  Modal  Truth  Criterion  as  the  plan¬ 
ning  process  proceeds.  One  of  the  above  techniques  (e.g.,  goal  promotion,  white  knights)  is  used  to 
resolve  a  possible  goal  interaction  whenever  it  occurs,  and  possibly  new  operators  are  introduced.  If 
several  alternatives  exist  to  order  an  operator,  to  choose  an  operator,  or  to  choose  a  goal,  a  nondeter- 
ministic  choice  is  made.  Whenever  a  new  constraint  is  inconsistent  with  the  existing  ones,  backtrack¬ 
ing  ia  advocated  based  on  dependency.  Chapman  showed  that  virtually  almost  all  existing  nonlinear 
plarmers  could  be  regarded  as  a  special  case  of  TWEAK;  and  indeed  he  has  cleaned  up  much  nf  the 
research  on  general-purpose  planning  in  the  last  decade.  However,  it  should  be  noted  that  TWEAK 
has  by  no  means  solved  the  general  planning  prcAlem  [Amst87].  For  irtstances,  it  may  not  develop  an 
optimal  plan,  and  the  restricted  fonn  of  propositions  on  which  the  Modal  Truth  Criterion  is  based 
makes  it  impossible  to  represent  nontrivial  domains.  A  more  detailed  discussion  about  its  weaknesses 
can  be  found  in  [Ch^87]  [Amst87]. 


Example  4-1 


Assume  that  the  initial  configuration  and  the  goal  configuration  of  a  blocks  worid  problem  are  given 
and  shown  in  Figure  4.1(a)  and  Figure  4.1(b),  respectively.  Also  assume  that  the  available  robot 
operations  are  puton  and  putdown  (see  Figure  4.2),  where  x  stands  for  post-conditions  and  p  stands 
for  preconditions.  Figure  4.3  shows  how  a  constraint-based  planner  can  solve  this  problem  by  incre¬ 
mentally  establishing  a  plan.  Note  that  in  Figure  4.3(a),  an  arc  between  two  operators  a  b  desig¬ 
nates  a  precedence  relationship,  which  is  supposed  to  be  derived  from  a  constraint  discovered  in  Fig¬ 
ure  4.3(b). 
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(a)  initial  configuration 


F 

C 

E 

D 

B 

A 

(b)  goal  configuration 


Figure  4.1  A  blocks  world  problem 
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puton(x.y): 

P:  clear(x).  clear(y) 

X:  (if  on(x^)  then  ^on(x^)).  on(x,y) 
putdown(x): 

P:  dear(x) 

X:  if  on(x^)  then  -on(xa) 


Figure  4.2  Robot  operations 
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Figure  4.3  Execution  of  a  non-linear  planner 
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Example  4-2 

This  example  is  another  illustration  of  a  constraint-based  planning  process,  assuming  that  the  initial 
configuration  and  the  goal  configuration  of  a  blocks  woild  are  given  and  shown  in  Figure  4.4(a)  and 
Figure  4.4(b),  respectively.  Also  assume  that  the  available  robot  operations  are  the  same  as  those  in 
Example  4-1.  Figure  4.5  shows  the  flow  of  the  process.  Note  that  in  this  case  it  is  not  able  to  derive 
an  optimal  plan,  as  blocks  F  and  E  need  not  be  putdown  before  being  stacked  on  A  and  F,  respec¬ 
tively. 

□ 


(a)  initial  configuration 


(b)  goal  configuration 


Figure  4.4  A  blocks  world  problem 
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Figure  4.5  Execution  of  a  non-linear  planner 
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42  THE  CONSUMER  ORDERING  PROBLEM 

In  some  cases,  the  operators  in  an  operator  ordering  problem  work  under  the  same  set  of 
resources  arxl  each  of  them  would  consume  a  (different)  subset  of  the  resources  in  order  to  accom¬ 
plish  its  corresponding  goal.  On  the  other  hand,  the  applicability  of  each  operator  is  determined  by  the 
availability  of  certain  resources.  We  call  such  a  problem  as  the  a  consumer  ordering  problem. 

Given  a  set  of  goals  (uj . «„)  and  a  set  of  corresponding  operators  (f  a  cortsumer  ord¬ 

ering  problem  can  be  formulated  as  follows.  Let  ^  be  a  set  of  resources  and  let  /?,  be  the  set  of 
resources  that  will  be  consumed  by  operator  /j  after  it  is  applied.  At  each  state  Sj  (which  is  deiined  to 
be  the  state  of  the  system  after  the  first  j  operators  have  been  applied),  in  additional  to  the  require¬ 
ment  that  /?,  has  to  be  available,  let  p,  (sy)  be  the  function  that  determines  the  applicability  of  //  (i.e., 
it  is  true  if  f,  can  be  applied  at  state  Sj,  and  it  is  false  otherwise).  Assuming  the  initial  state  is  sq,  the 
corresponding  consumer  ordering  problem  is  to  find  an  ordering p  of such  that  if/,  is  the 
y+Ith  operator  to  be  applied,  py(s,  )  is  true  and  /7,  is  available. 

Qearly,  the  approach  to  solve  a  consumer  ordering  problem  should  be  different  from  that  of  a 
simple  operator  ordering  problem.  This  is  because  the  preconditions  of  each  operator  are  state  depen¬ 
dent  (except  that  the  resources  to  be  consumed  are  known),  and  therefore  we  caiuiot  analyze  the 
operators  for  any  potential  goal  conflicts.  However,  we  could  have  the  observation  that  when  the  last 
operator  is  applied,  the  state  of  the  system  is  completely  predictable  (i.e.,  the  resources  consumed  by 
the  other  operators  are  known).  This  suggests  that  we  can  identify  the  last  operator  by  choosing 
among  the  operators  the  one  which  is  applicable,  assuming  all  the  other  operators  have  been  applied. 
If  more  than  one  of  such  operator  can  be  identified,  then  they  can  be  i4)plied  in  an  arbitrary  order. 
Subsequently,  we  can  remove  these  operators  from  further  consideration,  and  this  process  is  repeated 
until  all  the  operators  are  identified.  We  shall  call  diis  procedure  a  backward  operator  search  process 
later.  Formally,  this  process  can  be  described  as  follows,  assuming  the  notations  defined  earlier  are 
used: 

Algorithm  4.2 
step  1 

sequence  =  null; 
remained  =  } 

/?'  =  /?: 

step  2 

1.  e=<t>: 

For  each  fj  in  remained  do 

Let  D  =  R  -  Ri  and  let  J  be  the  state  at  which  the  resources  in  D  have  been  occupied; 

If  the  intersection  of  (/?i  u  ...  u  Rj.^  kj  Rj^^  u  ...  u  ft„)  and  ft,  is  not  empty,  there  is  a 
directly  resource  conflict  so  terminate;  no  solution  exists  for  the  problem. 

If  Piis)  ij  ^le.  tq}pend  /,  to  the  end  of  sequence,  let  R  =  R  -  ft,,  and  let  Q  =  Q  u  {/, }; 
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2.  For  each  /i  in  Q .  remove  /j  from  remained:  if  remained  is  ^  terminate,  else  go  to  step  1. 

□ 


43  THE  PRODUCER  ORDERING  PROBLEM 

A  producer  ordering  problem  is  like  a  consumer  ordering  problem,  except  that  instead  of  con¬ 
suming  some  resources,  each  operator  would  produce  some  resources.  However,  the  applicability  of 
each  operator  is  as  well  determined  by  a  function,  which  is  state  dependent. 

Given  a  set  of  goals  (mj . «„)  and  a  set  of  corresponding  operators  {f  i,...,/„),  a  producer  order¬ 

ing  problem  can  be  formulated  as  follows.  Let  be  a  set  of  resources  and  let  /?,-  be  the  set  of 
resources  that  will  be  produced  by  operator  after  it  is  applied.  At  each  state  Sj  (which  is  defined  to 
be  the  state  of  the  system  after  the  first  j  operators  have  been  applied),  let  p,  (s;)  be  the  function  that 
determines  the  applicability  of  fi  (i.e.,  it  is  true  if  /,  can  be  ^plied  at  state  Sj,  and  it  is  false  other¬ 
wise).  Assuming  the  initial  state  is  5  o,  the  corresponding  producer  ordering  proUem  is  to  find  an  ord¬ 
ering  p  of  / such  that  if  /,  is  the  y+lth  operator  to  be  ai^lied,  py(J,)  is  true. 

A  producer  ordering  problem  as  described  above  can  be  solved  by  a  forward  search  process. 
Starting  from  the  initial  state,  we  first  identify  a  set  of  operators  that  can  be  applied.  Since  each  opera¬ 
tor  would  simply  produce  some  resources,  the  order  of  these  operators  would  be  irrelevant  The  above 
process  can  be  repeated,  assuming  all  the  operators  just  identified  have  been  applied.  This  process 
continues  until  no  more  operators  remained  to  be  identihed.  Formally,  this  process  can  be  described 
as  follows,  assuming  the  notations  defined  eariier  are  used; 

Algorithm  43 

step  1 

sequence  =  rwll; 
remained  =  } 

R  -  the  set  of  resources  that  are  available  initially; 
step  2 

1.  e=4>; 

2.  For  each  /,  in  remained  do 

Let  D  =  R  u  {R/  I /,  €  Q]  and  let  s  be  the  state  at  which  all  the  resources  in  D  are 
available. 

If  Pi(s)  is  true,  append  fi  to  the  end  of  sequence  and  let  Q  =  G  u  {/, );  If  remained  is 
not  empty  and  no  such  fi  exists,  there  is  no  solution  to  the  problem  and  terminate. 

3.  For  each  /j  in  G ,  remove  fi  from  remained ;  if  remained  is  <t>  terminate,  else  go  to  step  2. 

□ 
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4.4  THE  CONSUMER-PRODUCER  ORDERING  PROBLEM 

In  addition  to  consumer  ordering  problems  and  producer  ordering  problems,  there  are  situations 
in  which  the  resources  need  to  be  reorganized,  which  would  need  both  consumers  and  producers.  We 
call  such  problems  as  consumer-producer  ordering  problems. 

Given  a  set  of  goals  (mj . u„),  a  set  of  producer  operators  (gj . g„),  and  a  set  of  consumer 

operators  (f  a  consumer-producer  ordering  problem  can  be  formulated  as  follows.  Let  /?  be  a 

set  of  resources,  let  G-,  be  the  set  of  resources  that  will  be  produced  by  operator  g,-  after  it  is  applied, 
and  let  /?,  be  the  set  of  resources  that  will  be  consumed  by  operator  //  after  it  is  applied.  At  each 
state  Sj  (which  is  defined  to  be  the  state  of  the  system  after  the  first  j  operators  have  been  q>plied),  in 
additional  to  the  requirement  that  Ri  has  to  be  available,  let  PiiSj)  be  the  ftmction  that  detemiines  the 
^plicability  of  /,  (i.e.,  it  is  true  if  /,  can  be  jq)pUed  at  state  Sj,  and  it  is  false  otherwise).  Similarly, 
at  each  state  Sj,  in  additional  to  the  requirement  that  has  to  be  available,  let  <7,(Sy)  the  ftmction 
that  determines  the  applicability  of  g,  (i.e.,  it  is  true  if  g,  can  be  ^plied  at  state  Sj,  and  it  is  false  oth¬ 
erwise).  Assuming  the  initial  state  is  Sq,  we  can  formulate  the  corresponding  consumer-producer  ord¬ 
ering  problem  as  to  find  an  ordering  p  of  g  i,...,g„  such  that  if  fi  is  the  y+lth  operator  to  be 

applied,  PjiSi)  is  true  and  /?/  is  available.  Furthermore,  if  g,  is  the  j+h'.i  operator  to  be  ^plied,  qjiSj) 
is  true  atul  is  available. 

Clearly,  a  consumer-producer  ordering  problem  as  formulated  above  can  be  solved  easily  by  a 
two-phase  approach.  In  the  first  phase,  we  sequetux  the  producer  operations  according  to  Algorithm 
4.3.  In  the  second  (^ase,  the  consumer  operations  are  sequenced  according  to  Algorithm  4.2.  The 
consumer-producer  ordering  problem  can  be  made  more  interesting  by  creating  the  concept  of 
“consumer-producer-pair".  First,  we  shall  assume  that  the  number  of  consumer  operations  and  the 
number  of  producer  operations  are  the  same.  Second,  for  each  g,,  1  ^  ^  n,  an  /y,  1  ^  j  ^  n,  is 

assigned  to  be  its  partner.  Now,  the  consumer-producer  ordering  problem  is  to  sequence  /],  ....  /„, 

gi . g„  into  a  plan  for  which  the  number  of  consumer-producer-pairs  is  maximal,  where  we  define 

a  consumer-producer-pair  of  a  plan  to  be  any  (/j,gy)  pair,  where  gy  is  the  producer  partner  of  /,  .  that 
can  be  executed  consecutively.  It  should  be  note  '  that  any  plan  generated  by  the  two-phase  approach 
described  earlier  would  have  at  most  one  consumer-producer-pair. 

A  consumer-producer  ordering  problem  as  formulated  above  can  be  solved  in  the  following  way. 
First,  identify  a  partial  ordering  among  the  producer  operations  according  to  Algorithm  4.5  (see 
below).  Second,  identify  a  partial  ordering  among  the  consumer  operations  according  to  Algorithm  4.4 
(see  below),  assuming  that  all  producer  operations  have  been  applied. 

Algorithm  4.4  (Partial  Ordering  Consumers) 

step  I 


sequence  =  null; 
remained  =  } 

index  =  0; 

R'=  R: 

step  2 
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1.  index  =  index  +  1;  =  (ji; 

2.  For  each  fi  in  remained  do 

Let  D  =  R  -  Ri  and  let  5  be  the  state  at  which  the  resources  in  D  have  been  occupied; 

If  the  intersection  of  (/?i  u  ...  u  /?j_i  u  u  ...  u  R„)  and  /?,  is  not  empty,  there  is  a 
directly  resource  conflict  so  terminate;  no  solution  exists  for  the  problem. 

If  pi(s)  is  true.  7,^^^  =  u  {/, }  and  /?'  =  /?'  -  /?,; 

3.  For  each  /,  in  7,.,^, .  remove  /y  from  remained:  if  remained  is  ^  go  to  step  4;  else  go  to  step 

1. 

4.  For  i  =  index-\  to  1  do 

For  each  fj  in  7y+i  do 
For  each  /*  in  7,  do 

□ 


Algorithm  4.5  (Partial  Ordering  Producers) 
step  1 

sequence  =  null; 
remained  »  {/i,.../, } 
index  =  0; 

R  s  the  set  of  resources  that  are  available  initially; 

e  =<>; 

step  2 

i-  "^index  ~ 

2.  For  each  /,  in  remained  do 

Let  D  =  /?  u  {/fy  I  /y  e  Q]  and  let  s  be  the  state  at  which  all  the  resources  in  D  are 
available. 

If  piis)  is  true.  7,*^  =  T’y^*,  u  {/. };  C  =  G  u  {/, );  If  remained  is  not  empty  and  no 
such  fi  exits,  there  is  no  solution  to  the  problem  and  terminate. 

2.  For  each  /y  in  7,,^,  remove  /y  from  remained;  if  remained  is  ^  go  to  step  3.  else  index  = 
index  •(-  1  and  go  to  step  1. 

For  I  =  1  to  index-l  do 

For  each  fj  in  7,  do 

For  each  /*  in  7y+i  do 


3. 
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□ 

The  partial  ordering  developed  in  Algorithm  4.4  designates  the  minimal  Gogical)  sequence  requirement 
for  the  consumers.  Similarly,  the  partial  ordering  developed  in  Algorithm  4.5  designates  the  minimal 
(logical)  sequence  requirement  for  the  producers.  Based  on  these  basic  requirements,  the  following 
algorithm  develops  a  plan  that  maximizes  the  number  of  consumer-producer-pairs.  This  algorithm 
develops  a  plan  incrementally.  At  each  step,  assuming  the  state  is  s,  it  looks  for  a  consumer- 
producer-pair  to  apply,  where  (1)  both  /,  and  gj  have  no  precedents  according  to  the  basic 

ordering  requirements,  (2)  g,  is  executable  under  s.  (3)  /,  is  executable  under  sagy,  and  (4)  //  does 
not  block  any  producer  in  the  future.  If  such  an  (fi,gj)  cannot  be  found,  it  chooses  any  applicable 
producer  operator  and  implies  it.  This  process  repeats  until  no  producers  remained  to  be  ordered;  at 
this  point  all  the  remaining  consumer  operations  are  applied  following  the  basic  ordering  requirements. 

Algorithm  4.6 

step  1 

If  there  remains  no  gj  to  be  ordered,  append  each  remaining  fi  to  plan . 
step  2 

Compute  /  =  {(/,.gy)l  (fi,gj)  is  a  consumer-producer-pair,  there  exists  no  /*  such  that  <  fi,  and 

there  exists  no  g,  such  that  g,  <  gy }.  If  /  is  choose  any  gy,  for  which  there  exists  no  g^  so  that  g* 

<  gy,  and  append  gy  to  plan. 

step  3 

Choose  any  (fi,gj)  in  /  for  which  (1)  gy  is  executable  under  s,  (2)  fi  is  executable  under  s^gj,  and 
(3)  fi  does  iK)t  block  any  producer  in  the  in  the  future.  If  such  (/i,gy)  can  be  found,  do  the  following: 

a.  Append  (/,  ,gy )  to  plan ; 

b.  s=s»gj\ 

c.  s  =  s  mfi', 

d.  Remove  any  partial  ordering  requirement  in  which  /,  takes  the  precedence. 

e.  Remove  any  partial  ordering  requirement  in  which  gy  takes  the  precedence. 

f.  go  to  step  2. 

If  no  such  pair  can  be  found,  choose  any  gy,  for  which  there  exists  no  g;i  so  that  g^  <  gj,  and 
append  gy  to  plan .  Also  remove  any  partial  ordering  requirement  involving  gy  and  go  to  step  2. 

□ 

As  a  final  remark,  the  consumer-producer  ordering  algorithm  discussed  has  a  feature  that  has  been 
implemented  in  some  other  planning  systems:  incremental  planning,  such  as  the  iq>proach  proposed  by 
Waldinger  [Wald77].  In  Waldinger’s  approach,  each  subgoal  is  solved  at  one  time,  followed  by  a 
goal-violation  checking.  Whenever  a  proposed  operator  creates  a  protected  goal  violation,  it  is  inserted 
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at  an  earlier  point  in  the  partial  plan.  A  check  of  goal  violation  is  then  conducted  again  and,  if  another 
goal  violation  occurs,  it  is  inserted  at  another,  yet  earlier,  point  in  the  plan.  However,  Wadinger’ 
approach  is  not  guided  by  any  information,  and  therefore  it  could  be  inefficient. 


4,5  SOLVING  BLOCK  WORLDS  AS  C-P  PROBLEMS 

Qearly,  a  block  worlds  problem  can  be  regarded  as  a  consumer-producer  problem.  To  see  this, 
we  shall  assume  that  the  following  operators  are  available: 

1.  remove  iblockjd  4)osition) 

2.  put  {block  Jd  jMsition ) 

Note  that  these  two  types  of  operators  are  sufficient  to  solve  a  blocks  world  problem.  To  convert  a 
blocks  world  problem  into  a  consumer-producer  problem,  the  remove  operators  can  be  regarded  as 
producer  operations,  and  the  put  operators  can  be  regarded  as  consumer  operations.  Given  the  initial 
configuration  and  the  final  configuration  of  a  blocks  world,  the  problem  is  to  remove  all  the  blocks  in 
the  initial  configuration  and  put  the  blocks  to  their  desirable  positions.  Assuming  there  are  n  blocks,  it 
is  clear  that  both  the  number  of  producer  operations  and  the  number  of  consumer  operations  aie  n . 
For  each  block,  the  corresponding  put  and  remove  curators  form  a  consumer-producer  pair.  Further¬ 
more,  the  functions Pi  and  (// ,  1  ^  t  ^  n,  as  described  earlier,  can  be  determined  based  on  the  follow¬ 
ing  physical  constraints: 

1.  No  block  can  be  moved  to  a  position  that  is  not  suf^rted. 

2.  No  block  can  be  removed  from  a  position  that  sui^rts  another  object 

A  blocks  world  problem  can  then  be  solved  by  Algorithm  4.6. 

Example  4-3 

Assume  that  the  initial  configuration  and  the  goal  configuration  of  a  blocks  world  are  given  as  in  Fig¬ 
ure  4.6.  Formulating  this  problem  as  a  consumer-producer  problem,  only  two  types  of  operations  are 
needed:  remove  and  put  (see  Figure  4.7).  Figure  4.8  shows  the  flow  of  Algorithm  4.6. 

□ 


A,p4 


C.p3 

B.p5 

A.p1 

B,p2 

C,p€ 

(a)  initial  configuration  (b)  goal  configuration 


Figure  4.6  A  blocks  world  problem 
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remove(x,p) 

P:  removable(x,p.s) 

put(x,p) 

P:  putable(x.p.s) 

Figure  4.7  Robot  operations 


remove(A.p1}.  remove(B^2).  remove(C,p3) 
put(A.p4).  put(B,p5).  put(C.p6) 


Consumers:  1:put(Aj}4).  2:  put(B^5),  3j)ut(C.p6) 

Producers:  4:  remov^A^I).  S:  remove(B,p2),  6:  remove(C,p3) 
Initial  Partial  Ordering:  6<4.3<2,2<1 
Consumer-Producer-Pair  Found  in  1st  Iteration:  (6.3) 

Partial  Ordering  Constraints  after  1st  Iteration:  3<2,2<1 
Consumer-Producer-Pair  Found  m  2nd  Iteration:  (5.2) 

Partial  Ordering  Constraints  after  2nd  Iteration:  None 
Consumer-Producer-Pair  Found  m  3rd  Iteration:  (4, 1) 


Figure  4.8  Execution  of  Algorithm  4.6 
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Example  4-4 

Assume  that  the  initial  configuration  and  the  goal  configuration  of  a  blocks  world  are  given  as  shown 
in  Figure  4.9.  Formulating  this  problem  as  a  consumer-producer  problem,  the  possible  operators  are 
shown  in  Figure  4.10.  Figure  4.11  shows  the  flow  of  Algorithm  4.6. 

□ 


A,p1 

F,p6 

B.p2 

E.p5 

C.p3 

D,p4 

(a)  initial  configuration 


F.P1 

C.p6 

E.P2 

D.p5 

B.p3 

A.p4 

(b)  goal  configuration 


Figure  4.9  A  blocks  world  problem 


remove(x.p) 

P:  removable(x,p) 
put(x.p) 

P:  putable(x,p,s) 


remove(A.p1),  remove(B,p2),  remove(C,p3)' 
remove(D,p4),  remove(E,p5),  remove(F.p6) 
put(A,p4).  put(B.p3),  put(C,p6y 
put(D.p5).  put(E,p2).  put(F.p1) 


Figure  4.10  Robot  operations 
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Consumers:  1:put(A.p4),  2:  put(B,p3).  3put(C.p6) , 

4:put(D.pS),  Sput(E.p2).  6put(F.p1) 

Producers:  7:  remove(A.p1),  8:  remove(B.p2).  9:  remove(C,p3) 

10:remove(D.p4),  11:remove(E,p5),  12:remove(F,p6) 
lr)itial  Partial  Ordering:  7  <  8  <9.  12  <  11  <  10. 2  <  5  <  6,  1  <  4  <3 
Consumer-Producer-Pair  Found  in  fst  Iteration:  None 
Producer  Applied  in  1st  Iteration:  12 

Partial  Ordering  Constraints  after  1st  Iteration:  7<8<9,2<S<6,  11<10,  1<4<3 
Consumer-Producer-Pair  Found  in  2nd  Iteration:  None 
Producer  Applied  in  2nd  Iteration:  7 

Partial  Ordering  Constraints  after  2nd  Iteration:  8  <9,  11  <10,2<5<6,  1  <4  <3 
Consumer-Producer  Pair  Found  in  3rd  Iteration:  None 
Producer  Applied  in  3rd  Iteration:  8 

Partial  Ordering  Constraints  after  3rd  Iteration:  11  <  10, 2<5<6,  1  <4  <3 
Consumer-Producer  Pair  Found  in  4th  Iteration:  None 
Producer  Applied  in  4th  Iteration:  1 1 

Partial  Ordering  Constraints  after 4th  Iteration:  2<5<  6,  1  <4  <3 
(AH  producers  have  been  applied  at  this  point) 

Consumers  Applied  Finally:  2. 1,  5. 4, 3.  6 


Figure  4. 11  Execution  of  Algorithm  4.6 
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Example  4~5 

Assume  that  the  initial  configuration  and  the  goal  configuration  of  a  blocks  world  are  given  as  shown 
in  Figure  4.12.  Formulating  this  problem  as  a  consumer-producer  problem,  the  possible  operators  are 
shown  in  Figure  4.13.  Figure  4.14  shows  the  flow  of  Algorithm  4.6.  Note  that  in  this  case  Algorithm 
4.6  is  able  to  develop  an  optimal  plan. 


A.pl 

F.p6 

B.p2 

E.p5 

C.p3 

D.p4 

(a)  initial  configuration 


Figure  4.12  A  blocks  world  problem 


remove(x,p) 

P:  removable(x,p,s) 
put(x.p) 

P:  putable(x.p.s) 


remove(A,p1),  remove(B,p2),  remove(C,p3)' 
remove(D,p4).  remove(E,p5),  remove(F,p6) 
put(A.pl),  put(B,p2).  put(C.p3)' 
put(Dj>7),  put(E,p8).  put(F,p9) 


Figure  4.13  Rtrtwt  operations 
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Consumers:  1:put(D.p7).  2:put(E.p)8.  3:put(F,p9) 

Produc  ers:  4:remove(D,p4),  S::remove(E,p5),  6:remove{F.p6) 
Initial  Partial  Ordering:  6  <S<4.  3<  2  <  1 
Consumer-Producer-Pair  Found  in  1st  Iteration:  (6,3) 

Producer  Applied  in  1st  Iteration:  6 
Partial  Ordering  Constraints  after  1st  Iteration:  S<4,2<1 
Consumer-Producer-Pair  Found  in  2nd  Iteration:  (5,2) 
Producer  Applied  in  2nd  Iteration:  6 
Consumer-Producer  Pair  Found  in  3rd  Iteration:  (4, 1 ) 

Producer  Applied  in  3rd  Iteration:  4 

Partial  Ordering  Constraints  after  3rd  Iteration:  None 

(All  producers  and  Consumers  have  been  apf^ied  at  this  point) 


Figure  4.14  Execution  of  Algorithm  4.6 
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